1.6 KiB
1.6 KiB
Logging and Monitoring
- rsyslog
- syslog: RFC5424
/var/log/syslog
/var/log
-
eBPF
-
Zabbix
-
Influx
-
Grafana
-
Prometheus (+ Grafana + Loki as stack)
-
timescaleDB
-
AlertManager
-
Loki
-
Graphite
-
Spiceworks
-
Crowdsec
-
Netdata
-
NodeExtractor/NodeExporter
-
ELK - Elasticsearch, Kibana, Logstash
https://grafana.com/blog/2016/01/05/logs-and-metrics-and-graphs-oh-my/
Setting up Grafana: https://grafana.com/docs/grafana/latest/setup-grafana/installation/docker/
Setting up Prometheus: https://github.com/prometheus/prometheus
Some things to measure:
- apt status (for security/critical updates that haven't been run yet)
- reboot needed (presence of /var/run/reboot-required)
- fail2ban jail status (how many are in each of our defined jails)
- CPU usage
- MySQL active, long-running processes, number of queries
- iostat numbers
- disk space
- SSL cert expiration date
- domain expiration date
- reachability (ping, domain resolution, specific string in an HTTP request)
- Application-specific checks (WordPress, Drupal, CRM, etc)
- postfix queue size
- apt/yum/fwupd/... pending updates
- mailqueue length, root's mailbox size: this is an indicator for stuff going wrong silently
- pending reboot after kernel update
- certain kinds of log entries (block device read error, OOMkills, core dumps).
- network checksum errors, dropped packets, martians
- presence or non-presence of USB devices: desktops should have keyboard and mouse. servers usually shouldn't. usb storage is sometimes forbidden.