Since 2.2.0: "Periodic notifications during host problems" rule triggers period service notifications?

CMK version: 2.2.0p5 managed
OS version: Debian 11

Since updating to 2.2.0, we observed that the rule “Periodic notifications during host problems” triggers period notifications for services on thoses hosts as well.

“Periodic notifications during service problems” are disabled and “Periodic notifications during host problems” are set to 120 minutes.

Until 2.1.0, only host problems notifications were repeated every 120 minutes, as expected.

But after the update to 2.2.0, notifications for service problems on those hosts are repeated as well, every 120 minutes.

To validate that the notifications are triggered by the rule “Periodic notifications during host problems”, I changed its interval to 5 minutes and indeed, the notifications for service problems were repeated every 5 minutes.

Was there any change in 2.2 that might explain this changed behavior (I tend to say misbehavior)?

This is on a site with started on Raw Edition 1.2, was updated, over 1.6, 2.0.0 up to 2.1.0p20.cre. On 2.1.0p20 it was upgraded to Managed Edition und then again updated to 2.1.0p30.cme. From there, we made the step to 2.2 (2.2.0p4.cme), where the problem started.

I looked at cmc.log and notify.log, but all I can see is that the SERVICENOTIFICATIONNUMBER is indeed counted up. I can’t find any hint on why these notifications are repeated.

(cf. my first attempt to describe this problem, in German: Seit Upgrade auf 2.2.0 plötzlich Periodic notifications für Services, obwohl disabled? - #2 by Norm)

Reverse engineering the changes from 2.1 to 2.2, I’m now pretty sure that there was a bug introduced in cee/microcore_config.py in 2.2.0:

The function _get_generic_service_object_info() in lib/python3/cmk/base/cee/microcore_config.py, had this change from 2.1.0p30 to 2.2.0p5:

@@ -1756,7 +1806,7 @@
         if service_data.command_name == "check-mk-inventory":
             default_retry_interval = check_interval
         retry_interval = float(attrs.get("retry_interval", default_retry_interval))
-        notification_interval = float(attrs.get("notification_interval", 0.0))
+        notification_interval = self._host_attrs.get("notification_interval", 0.0)
         first_notification_delay = float(attrs.get("first_notification_delay", 0.0))
         notifications_enabled = bool(int(attrs.get("notifications_enabled", True)))
         flap_detection_enabled = bool(int(attrs.get("flap_detection_enabled", True)))

It doesn’t make sense to use the host notification_interval inside _get_generic_service_object_info()!

Looking at the according host function _get_generic_host_object_info(), there’s a somewhat similar change:

@@ -1311,7 +1357,7 @@
         max_check_attempts = int(self._host_attrs.get("max_check_attempts", 1))
         check_interval = float(self._host_attrs.get("check_interval", default_check_interval))
         retry_interval = float(self._host_attrs.get("retry_interval", default_check_interval))
-        notification_interval = float(self._host_attrs.get("notification_interval", 0.0))
+        notification_interval = self._host_attrs.get("notification_interval", 0.0)
         first_notification_delay = float(self._host_attrs.get("first_notification_delay", 0.0))
         notifications_enabled = bool(int(self._host_attrs.get("notifications_enabled", True)))
         flap_detection_enabled = bool(int(self._host_attrs.get("flap_detection_enabled", True)))

When I reverse the change in _get_generic_service_object_info(), the behavior seems to be correct again.

I’ll report this to feedback@checkmk.com as well.

3 Likes

Thanks for the detailed analysis! :grinning: It was indeed a regression in 2.2, fixed in Fixed periodic service notification interval.

Cheers,
S.

1 Like