Monitor number of stale services

jplitza · May 11, 2022, 1:58pm

I ended up with this small script in local/lib/nagios/plugins/check_cmk_stale_services which I configured as Nagios plugins for the monitoring server itself:

#!/bin/sh

set -eu

NUM_STALE_SERVICES="$(lq 'GET services\nStats: staleness >= 1.5\nFilter: host_state = 1\nFilter: check_type = 0')"
WARN="${1:-10}"
CRIT="${2:-100}"

echo "${NUM_STALE_SERVICES} stale active services (warn/crit at ${WARN}/${CRIT}) | stale_services=${NUM_STALE_SERVICES};${WARN};${CRIT};0"

if [ "$NUM_STALE_SERVICES" -gt "$CRIT" ]; then
    exit 2
elif [ "$NUM_STALE_SERVICES" -gt "$WARN" ]; then
    exit 1
else
    exit 0
fi

Obviously only works in single-site installations, but that’s fine for now. I like that I can easily check the number of stale active checks, without the number being inflated by passive checks that weren’t updated by the active Check_MK check.