Local check data livetime

Let me explain what happens if you put a regular agent plugin into a subdirectory, say 90. Let’s assume the agent is called every 60 seconds and the plugins runs for 50 seconds:

  • 00s: Agent gets called. It doesn’t see any cachefile below /var/lib/check_mk_agent/cache/ so it starts the agent plugin in background (nohup … &) and redirects its output to a cachefile below /var/lib/check_mk_agent/cache/. The agent returns all other data except that from the plugin.

  • 50s: plugin is done (with rc=0). output is written to /var/lib/check_mk_agent/cache

  • 60s: agent gets called. Sees the cachefile which is only 10 seconds old and thus returns it with <<<...:cached(50,90)>>> in the section header (the 50 being the timestamp of the file and 90 being the max. age). This “decoration” allows the server to check how old the data is and if it’s still valid. The agent doesn’t call the plugin again because 10<90.

  • 120s: agent gets called. Sees the cachefile which is now 70 seconds old (120-50). returns it. but again, doesn’t call the plugin because 70<90 (i.e. the cachefile is not yet outdated).

  • 180s: agent gets called. The cachfile is now 180-50=130 seconds old. I consider this at least surprising, but indeed, that cachefile is returned. now the plugin is called again because 130 (cachefile age) is greater than 90 (directory name).

  • 230s: same as above (50s)

As you can see, putting a plugin into the 90 directory results in plugin calls every 180 seconds. In the meantime, the agent returns the cached file and the server can see from the :cached(timestamp-of-cachefile,90) part in the section header how old the data is and how long it can be considered valid. If the data is too old, the server will show it dithered and tell you that it is outdated.

As for your 2nd question: if it turns out that the plugin that has run in the background exits with a bad returncode (exit 1 instead of exit 0), then the background job simply discards the cachefile that might have been written by the plugin so far and there is no data to return when the agent runs the next time. The agent will then re-start the plugin.

So. This was for “regular” agent plugins. I haven’t looked too deeply in the behaviour of local checks. Unfortunately, they don’t seem to be decorated with that :cached(x,y) thingy when they are returned and that might exactly be your issue.

What you could do is: put your local check plugin somewhere else (outside the checkmk directories), have it called by cron and write its output to a spoolfile, preceeded by a number, like so:

/var/lib/check_mk_agent/spool/90-my-spoolfile

The file must then contain the section header <<<local>>>. This file will only be returned if it is younger than 90 seconds.

1 Like