Monitoring Stack

See Everything, Log Everything

Two core monitoring systems run on cainfra01 (Rocky Linux 10.1) as Docker containers:

  • LibreNMS — network monitoring via SNMP (what’s up, what’s slow, what’s broken)
  • Graylog — centralized log management (what happened, when, and why)

Together they provide complete observability across the lab.


LibreNMS — Network Monitoring

URL: http://librenms.rpc-cyberflight.com

What It Monitors

Every device in the lab reports metrics via SNMP:

Device IP SNMP Version Metrics
pve1 192.168.x.x v3 (SHA/AES) CPU, RAM, disk, network, temperature
pve2 192.168.x.x v3 (SHA/AES) CPU, RAM, disk, network, temperature
bighost 192.168.x.x v3 (SHA/AES) CPU, RAM, disk, network, temperature
cainfra01 192.168.x.x v3 (SHA/AES) CPU, RAM, disk, network
RonClaw 192.168.x.x v3 (SHA/AES) CPU, RAM, disk, network
BigBrain 192.168.x.x v3 (SHA/AES) CPU, RAM, disk, network
CADC01 192.168.x.x v2c CPU, RAM, disk, network
cadc02 192.168.x.x v2c CPU, RAM, disk, network
GL-MT6000 192.168.x.x v2c CPU, RAM, interfaces

SNMP v3 vs v2c

SNMPv3 uses authentication (SHA) and encryption (AES) — credentials are never sent in cleartext. All Linux and Proxmox hosts use v3.

Windows Server 2019 and OpenWrt don’t support SNMPv3 natively, so those devices use SNMPv2c with a 32-character random community string, restricted to accept queries only from LibreNMS (192.168.x.x).

Custom Extend Scripts

Proxmox hosts run custom SNMP extend scripts in /opt/snmp-scripts/ for metrics that standard SNMP MIBs don’t cover:

  • cpu-temp — CPU temperature monitoring
  • smart-status — disk health via S.M.A.R.T.
  • lvm-usage — LVM thin pool utilization

Architecture

LibreNMS (Docker)
├── librenms/librenms:latest — web UI + poller (:8000)
├── librenms/librenms:latest — dispatcher sidecar
├── mariadb:10.5 — database
└── redis:7-alpine — caching

Graylog — Log Management

URL: http://graylog.rpc-cyberflight.com

What It Collects

All devices send logs to Graylog for centralized analysis:

Input Port Protocol Sources
Syslog 1514 UDP/TCP Linux hosts, Proxmox, router
GELF 12201 UDP Docker containers
Beats 5044 TCP Winlogbeat (Windows event logs)

Architecture

Graylog (Docker)
├── graylog/graylog:5.2 — web UI + processing (:7777→9000)
├── mongo:6.0 — configuration database
└── opensearchproject/opensearch:2.4.0 — log storage + search

Why Centralized Logging?

When something breaks at 2 AM, you don’t want to SSH into six machines to read log files. Graylog collects everything in one searchable interface. You can:

  • Search across all hosts simultaneously
  • Set up alerts for specific patterns (failed logins, disk errors, service crashes)
  • Correlate events across systems (DNS change → service outage)
  • Retain logs for compliance and forensics

What You Learn Building This

  • SNMP — MIBs, OIDs, community strings, v2c vs v3, extend scripts
  • Network monitoring — device discovery, alerting, dashboards, capacity planning
  • Log management — syslog, GELF, Beats, log parsing, search queries
  • Docker Compose — multi-container applications, volumes, networking
  • MariaDB / MongoDB / OpenSearch — database administration fundamentals
  • Observability — the difference between monitoring (is it up?) and observability (why is it slow?)
Scroll to Top