I have an issue, when we put a bulk downtime/outage in place (200+hosts/thousands of services) it sends a fallback notification via the check-mk-notify user for every single downtime event. In our notification rules we do not have ‘start or end of a scheduled downtime’ enabled for host or service events. But for whatever reason it still tries to fire this as a fallback. The result is thousands of notifications that seem to kill the monitoring core, our server literally goes idle and all polling stops until it has dealt with all the notifications which can take up to 10mins, and by this time every host/service is stale, when it has finished with the notifications it has to thrash to catch up all the polling.
If we go into the check_mk_objects file and find the definition for the check-mk-notify user and remove the ‘s’ option from host_notification_options and service_notification_options its all good, it doesnt notify for downtime events. But that file is dynamically created, its regenerated every time there is a change in check_mk/WATO, so how can we make that change permanent?
Running 1.6.0p14 raw on rhel 7.7