Monitor number of stale services

Hi @jplitza

I have only an idea that might help: One can query livestatus information with lq. The
livestatus tables hosts and services have a staleness column. So if you query e.g.
your hosts table with the columns name and staleness i.e.

lq "GET hosts\nColumns: name staleness"

You get a list, similar to this one (hostnames intentionally modified):

HOST1;0.95
HOST2;0.95
HOST3;0.95
HOST4;0.683333
HOST5;0.95
HOST6;0.0666667

The value on the “right hand” is always rising when you execute the query again, until it reaches 1 (or slightly above 1), and then “goes down”. I believe that this represents a “check cycle/interval”. If one filters this with staleness > 1 and counts that, one might be able to check the “number of stale services”, this way.

Obviously, one still has to write a check for this. I hope I haven’t made a mistake in my thinking, and am leading you on a completely wrong path… :slight_smile:

Perhaps someone from the forum can confirm or refute this, and/or has a “different/better idea”. In the meantime, here is the official livestatus documentation, with lots of helpful hints and examples:

5.3.2: Retrieving status data via Livestatus

HTH,
Thomas

2 Likes