Problems Excluding downtime from checks & notifications

CMK version: 2.1.0
OS version: Windows

I am hoping someone can advise on this. We have a SQL server that gets rebooted nightly. I have created:
a.) a time period to exclude this time frame
b.) a host downtime for this time frame
c.) established a host check period utilizing the time period which excludes the server down time (‘a.’ above)
d.) established a service check period for Check_MK service, also utilizing the time period which excludes the server down time (‘a.’ above)

However, Check_MK is still logging and reporting warnings for services that get logged during the excluded time period, which are occurring as a result of the reboot process on the server itself. Check_MK is simply waiting until the end of the designated “down time” time period to generate the notification.

Everything is set to exclude from 2:37 AM through 2:35 AM for the server reboot process, but we are still receiving a notification at 2:35 AM when the host down time ends, identifying that a warning entry was created on the server at 2:31 AM (which was as stated above, simply because the service getting logged was restarting as part of the reboot).

Please advise if you can. I am still new to Chewck_MK and cannot figure out why these warnings are not being ignored since they occur during the down time and by the time the downtime is over, the reported services are back up and running again so should not be getting logged/notified.

Thanks in advance.

Sorry, the down time is actually 2:27 through 2:35.

Has nobody else experienced this situation?

Can anybody please comment and provide some insight as to how these warnings can still be getting logged/notified? Everything in our configuration tells me we should no longer be getting them so I am confounded as to what I have missed in trying to prevent them.

Any advice would be greatly appreciated. Thanksa.

Has nobody else really not experienced this issue?

Please provide some insight. This is quite frustrating, when we have downtimes excluded from our service monitoring rules, but are still receiving warning notifications for events that occur during those down times.

I am starting to receive negative feedback from both management and the team members who are the recipients of these messages because I am unable to prevent them…when we know the events are the direct result of unavailable services during the server reboot process.

I would really appreciate any guidance that anybody can give as to how to prevent warnings from registering, or at least avoid receiving notifications for those events when they do occur.

Thank you in advance for any assistance you might provide.

@tbonney am I right in assuming, that you are monitoring the logs of the Windows server? Is that the service that causes the notification you get?

Hello @robin.gierse. Thank you for your reply.

Yes, it is most often the system log and application log entries that we are receiving these notifications for. As stated in my previous posts though, the warnings are for events that were created as a direct result of the server reboot (like WinRM, for example) because the services are unavailable during the reboot process. Having created a down time and excluded that form the CheckMK monitoring period, I had hoped it would ignore these types of entries. However, what seems to be happening is that once the CheckMK down time period expires, it is then reporting and sending notifications for these warnings that actually occurred during the down time exclusionary time frame.

Have you any advise you can provide on how to avoid this? Thank you.

As far as I can see, you are using the “old” log monitoring, not the event console. This creates services named “LOG foobar”. These services stay CRIT (or WARN), until the log messages received are acknowledged. I assume this leads to your situation, and it works as designed and there is no way around this.

Actually, with all that you have tried (downtimes, time periods, etc.) you are probably way over the top.

I recommend forwarding events to the event console and handle them there. That is way more efficient and gets you rid of the problem described here.

Thanks for the insight Robin. I’ll have to investigate using the event console. I’m very new to Check MK, so the information looks daunting, but I’ll research it further and see if it might be a good approach to address this issue for me. Thanks again.

I sympathize @tbonney, getting to know Checkmk takes time and effort, I am not going to lie. But it is worth it, I can promise you. Good luck on your journey, I am positive, that you will succeed! :muscle: