I enabled all kind of loggings and when I trigger an error do an tail on my /var/log/*.log | grep student-20, I got the following a couple of times.
2024-01-29 11:44:30 [7] [notification helper 4084] service "student-20.xxxx;HTTPS Homepage": postponing, notifications are disabled, but periodic notifications are enabled
I checked my configuration and this is correct because when a service has an issue and is not acknowledged we want a recurring notification so that will be send every 5 minutes (untill acknowledged)
I disabled that setting just for testing purposes and triggered again an issue
When first CRIT state was found, no notification was send. Also not after 3 check attemps and then I saw
2024-01-29 11:51:37 [7] [notification helper 17114] service "student-20.xxxxx;HTTPS Homepage": postponing, delayed notification.
That is also expected because I had an rule for this to delay it for 5 minutes.
When I disable that “Delay Notification” rule and triggered an CRIT, I received an telegram. but only 1 and not every x minutes which we do want so I enabled “Periodic notifications during service problems” and triggered an CRIT.
Suddenly after 1 check attempts i get notified. I the logs i see:
2024-01-29 14:14:59 [7] [alert helper 15834] not sending alert of type CHECKRESULT about service "student-20.XXXXX;HTTPS Homepage": there are no alert handlers defined
2024-01-29 14:14:59 [7] [alert helper 15834] not sending alert of type STATECHANGE about service "student-20.XXXXX;HTTPS Homepage": there are no alert handlers defined
2024-01-29 14:14:59 [7] [core 15788] released SerialToken{Request[student-20.XXXXX],3122} => SerialTokenFactory{3122:10}
2024-01-29 14:14:59 [7] [core 15788] scheduling service "student-20.XXXXX;HTTPS Homepage" at 2024-01-29 14:15:59 with commandline [/omd/sites/poort80hs/lib/nagios/plugins/check_http --ssl -t 60 --onredirect=follow -e 200,302,301 --sni -I 'student-20.XXXXX' -H 'student-20.XXXXX']
2024-01-29 14:14:59 [7] [core 15788] [generic pool scheduler] scheduling service "student-20.XXXXX;HTTPS Homepage" at 2024-01-29 14:15:59
2024-01-29 14:14:59 [7] [alert helper 15834] not sending alert of type CHECKRESULT about host "student-20.XXXXX": there are no alert handlers defined
2024-01-29 14:14:59 [7] [alert helper 15834] not sending alert of type CHECKRESULT about host "student-20.XXXXX": there are no alert handlers defined
2024-01-29 14:15:00 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": sending PROBLEM notification to its contacts
2024-01-29 14:15:00 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": spooling notification to rule based notifications
HOSTNAME=student-20.XXXXX
HOSTALIAS=student-20.XXXXX
2024-01-29 14:15:00 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": next notification in 5 minutes
HOSTNAME=student-20.XXXXX
HOSTALIAS=student-20.XXXXX
2024-01-29 14:15:01 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:02 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:03 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:04 [7] [core 15788] [livestatus external] spooled command 'LOG;SERVICE NOTIFICATION:xxxxxxxxxx
.......
2024-01-29 14:15:06 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:07 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:08 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:09 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:10 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:11 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:12 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:13 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:14 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:15 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:16 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:17 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:18 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:19 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:20 [7] [alert helper 15834] not sending alert of type CHECKRESULT about host "student-20.XXXXX": there are no alert handlers defined
2024-01-29 14:15:20 [7] [alert helper 15834] not sending alert of type CHECKRESULT about host "student-20.XXXXX": there are no alert handlers defined
2024-01-29 14:15:20 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
2024-01-29 14:15:21 [7] [notification helper 16890] service "student-20.XXXXX;HTTPS Homepage": postponing, periodic notification
Its unexpected in my opinion that I receive an message. because the check attempts were not all done yet. Only 1 of the 3
I made sure the site was OK again, waited a few minutes and again I triggerd an error
When CheckMK noticed that CRIT, I immediately get notified and it should not.
I checked the notification analysis and I saw that my first at 14:15 was SERVICENOTIFICATIONNUMBER 1.
The one i triggered at last (more then 5 minutens later) was SERVICENOTIFICATIONNUMBER 2.
Why that is, I have no clue because the services was already OK for more then 5 minutes
What more can I do to accomplish this:
- When services goes down, checks 3 times.
- When still down, get notified.
- Wait x minutes and if still not acknowledged, notify again.