Parent/Dependency Weirdness

Corvar · September 24, 2021, 4:55pm

I am having an issue with dependencies that I can’t seem to get my head around. I currently have something like

Host1<->Host2<->Host3,Host4<->CheckMK Server

I received alerts
Host2 Service Ping Critical event
Host3 Down event
Host4 Down event
Host1 Down event

Host2 never officially goes to Down.

Host1 lists Host2 as a Parent
Host2 lists Host3,Host4 as a Parent

My understanding is that Host1 shouldn’t go “Down” after Host3 and Host4 are both down, it should go “Unreachable”.

Am I missing something?

For versions, I am on CheckMK-raw-2.0.0p9.
Host1 uses the check_mk_agent. Host2 is Ping Only (no API/no Agent). Host3/Host4 utilize SNMP.

thorian93 · September 26, 2021, 8:42am

I might get your question wrong - it’s early Sunday mind me - but something about your situation sound weird.

Indirectly, yes. Technically Host 1 becomes UNREACHABLE when Host 2 goes DOWN or UNREACHABLE, Host 2 becomes UNREACHABLE when both host 3 and 4 go DOWN.

Your configuration says:
From the viewpoint of the Checkmk server:
Host 2 can only be DOWN when host 3 or 4 are UP.
Host 1 can only be DOWN when host 2 is UP.
Hosts 3 and 4 will always alert.

What exactly happened that lead to your alerts?

Corvar · September 26, 2021, 8:56pm

Things are more complex between the CheckMK server and Host3/Host4 and there are additional dependencies. But there was a network interruption between CheckMK and Host3/Host4, which resulted in a storm of alerting despite Host3 and Host4 getting marked as down almost immediately. Many hosts on the far (from CheckMK) side of Host3/Host4 alerted, even though all hosts had a parent that eventually lead to Host3/Host4.

r.sander · September 27, 2021, 7:35am

Parents only work with the host state, not with service states.

The host Host2 has to be down for Host1 to be unreachable.

thorian93 · September 27, 2021, 7:37am

I think you misread @r.sander. The Hosts 3 and 4 where DOWN, so Host 2 never should have alerted in the first place.

Assuming your configuration of parents ins flawless @Corvar, I am at a loss here.

r.sander · September 27, 2021, 8:25am

Is the description in the first post in the sequence of the events?

Host2 never went down but sent a service notification first.

Host3 and Host4 then went down.

Host1 sents out a down notification because Host2 is still up. Only the direct parent host is relevant.

Corvar · September 27, 2021, 3:09pm

In my past dealings with Nagios, that isn’t what should happen. Specifically, Host1 shouldn’t send a Down notification because it and Host2 should both be in state Unreachable.

https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/networkreachability.html

I guess it is possible that when Host3 and Host4 went down, it triggered a check of all children and Host2 responded (so stayed in Up) and then Host1 wouldn’t be Unreachable.

r.sander · September 27, 2021, 9:01pm

This is what I meant to say. If the host state of Host2 is still UP, Host1 will not become UNREACH.

system · September 27, 2022, 9:01pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.