Recurring downtime issues with distributed monitoring

CMK version: 2.2.0p9.cee (all)
OS version: Ubuntu 20.04.6 LTS (All)

not really much in the way of error messages - the basics, I have distributed monitoring across 4 hots, I’ll call them oscar, bob, charlie, and victor; they’re all at the same os level, all seem to work ok for most things; I’ve got a rule set to apply recurring downtimes, the rule that says something like ‘at 0400, set downtime on any host tagged as Windows for 2 hours’; apply to the main folder, /

when time comes (I’ve tried to do some debugging with another test rule, setting time close to current time, changing tags, et al), hosts that match the rule go into downtime as expected … except for any hosts monitored from/by the ‘charlie’ host.

I’m starting to try to dig through logs to try to find where things may have gone awry, but I’m not finding anything useful so far, so I thought I’d ask if someone had ideas or pointers of where to look to try to sort this out.

thanks.

To clarify: oscar, bob, charlie, and victor are Checkmk sites, right?

Under this assumption: I can only imagine that there is a configuration issue: Does charlie get their configuration replicated from the central site (whoever that is)? If so, is it possible that your folder layout has an effect here? Talking about time, have you checked the timezone configuration on your servers?

1 Like

sorry, I didn’t see any emails about a reply here; yes those names are cmk hosts/sites with distributed monitoring. oscar is the main host and does seem to push out config to everyone.

I hadn’t thought to look at timezones, but I suppose it’s feasible; I can look at that tonight sometime.

thanks.

1 Like

following up, oddly two of the hosts (bob and another one I didn’t include, but I can call it ‘karl’) had system time set to UTC; changed everyone to EDT for system time and verified the omd user (omd su ) all use EDT. created another testing rule for recurring downtime 1 minute on specific hosts recurring every hour and none of the hosts monitored by charlie ever make it into the recurring down times list - even for the configuration rule. I’m kinda at a loss for where to start digging next.

You are not using flexible downtimes right?

nothing for flexible, just straight up downtimes. not sure if this reply will take this time.