Local cached checks are not updated since upgrade to 2.0 (outages are NOT recognized)

Hi @moritz

thanks for your suggestion.

This still does not look like the right solution to me. I am still thinking that a fix on the server side would be the better way (especially for short term).

Your solution would need to

  • distinguish between
    ** checks which are cached less than the check interval
    ** checks which are cached longer than the check interval
    ** (and check interval is not really known on client side)
  • define a new directory to place the checks with a cache time less than the check period (to avoid
    processing by checkmk caching)
  • create a plugin which searches all files in this directory and wrap it like you suggested

So we

  • have different ways depending on the cache-time vs. check interval
  • a need to move checks to other locations if we change the check interval on server side (so efffectively we may not change the check interval)
  • for a clean solution effectively a need to tell the client the check interval
  • as we have many 45s-Checks and also longer cached checks this would mean confusion for our users

What are the cons to the solution I suggested?:

  • There is an existing caching infrastructure which is working well
  • Currently there is just a “misinterpretation” of cached data which was not updated as there was no trigger for it
  • effectively its just a precision of the definition of stale from “older than cachetime” to “older than cachetime AND last check/check interval”

This would

  • need no change on the client side
  • keep a clean and simple way to cache checks
  • allows to change the check interval as needed
  • is much easier to implement and fits
  • is consistent with the checkmk 1.x behaviour

I do see only pros for this solution and no pros in a definition on client side.
What am I missing?

Currently it seems much easier to patch the checkmk after each update…

Regards
Michael

PS: Putting your solution in plugins/60 would also fail if we change the check interval to e.g. 2 minutes) leading to no alarming at all

2 Likes