Hello everyone,
we had an alert to our 24/7 tonight which went Crit and then OK right away.
we noticed something strange in our notification history:
Systemd Service Summary listed a service as CRIT “activating for 8d”, send a mail, then switched to OK again right away. the server has been under monitoring for months now.
the default timeout for activating checks should be 30 seconds for war, 60 seconds for crit.
Multiple questions:
-
Why does it go CRIT and right back to OK
the service is currently listed as “active” in systemctl -
Why does this happen 8 days after the 60 second threshold has been passed?
-
Has anyone seen something like this before and maybe point me to somewhere where i can prevent such weirdness in the future?
this woke up our 24/7 in the middle of the night and hes not happy.
if you could help prevent him from waking up for nothing in the future, it would be much appreciated
Service ‘systemd-tmpfiles-clean’ activating for: 8 d (warn/crit at 30.0 s/60 s)CRIT