Add some self hosting notes
Some checks are pending
/ test (push) Waiting to run

This commit is contained in:
jgrogan 2025-02-09 15:34:48 +00:00
parent 1d0689f177
commit 060c7de471
3 changed files with 103 additions and 17 deletions

View file

@ -0,0 +1,58 @@
# Logging and Monitoring #
* rsyslog
* [syslog](https://en.wikipedia.org/wiki/Syslog): RFC5424
`/var/log/syslog`
`/var/log`
* eBPF
* Zabbix
* Influx
* Grafana
* Prometheus (+ Grafana + Loki as stack)
* timescaleDB
* AlertManager
* Loki
* Graphite
* Spiceworks
* Crowdsec
* Netdata
* NodeExtractor/NodeExporter
* ELK - Elasticsearch, Kibana, Logstash
https://grafana.com/blog/2016/01/05/logs-and-metrics-and-graphs-oh-my/
Setting up Grafana: https://grafana.com/docs/grafana/latest/setup-grafana/installation/docker/
Setting up Prometheus: https://github.com/prometheus/prometheus
Some things to measure:
- apt status (for security/critical updates that haven't been run yet)
- reboot needed (presence of /var/run/reboot-required)
- fail2ban jail status (how many are in each of our defined jails)
- CPU usage
- MySQL active, long-running processes, number of queries
- iostat numbers
- disk space
- SSL cert expiration date
- domain expiration date
- reachability (ping, domain resolution, specific string in an HTTP request)
- Application-specific checks (WordPress, Drupal, CRM, etc)
- postfix queue size
* apt/yum/fwupd/... pending updates
* mailqueue length, root's mailbox size: this is an indicator for stuff going wrong silently
* pending reboot after kernel update
* certain kinds of log entries (block device read error, OOMkills, core dumps).
* network checksum errors, dropped packets, martians
* presence or non-presence of USB devices: desktops should have keyboard and mouse. servers usually shouldn't. usb storage is sometimes forbidden.
## Further Reading ##
* https://www.redhat.com/en/blog/log-aggregation-rsyslog
# Auto updates