Parent/Dependency Weirdness

I am having an issue with dependencies that I can’t seem to get my head around. I currently have something like

Host1<->Host2<->Host3,Host4<->CheckMK Server

I received alerts
Host2 Service Ping Critical event
Host3 Down event
Host4 Down event
Host1 Down event

Host2 never officially goes to Down.

Host1 lists Host2 as a Parent
Host2 lists Host3,Host4 as a Parent

My understanding is that Host1 shouldn’t go “Down” after Host3 and Host4 are both down, it should go “Unreachable”.

Am I missing something?

For versions, I am on CheckMK-raw-2.0.0p9.
Host1 uses the check_mk_agent. Host2 is Ping Only (no API/no Agent). Host3/Host4 utilize SNMP.

I might get your question wrong - it’s early Sunday mind me - but something about your situation sound weird.

Indirectly, yes. Technically Host 1 becomes UNREACHABLE when Host 2 goes DOWN or UNREACHABLE, Host 2 becomes UNREACHABLE when both host 3 and 4 go DOWN.

Your configuration says:
From the viewpoint of the Checkmk server:
Host 2 can only be DOWN when host 3 or 4 are UP.
Host 1 can only be DOWN when host 2 is UP.
Hosts 3 and 4 will always alert.

What exactly happened that lead to your alerts?

Things are more complex between the CheckMK server and Host3/Host4 and there are additional dependencies. But there was a network interruption between CheckMK and Host3/Host4, which resulted in a storm of alerting despite Host3 and Host4 getting marked as down almost immediately. Many hosts on the far (from CheckMK) side of Host3/Host4 alerted, even though all hosts had a parent that eventually lead to Host3/Host4.

Parents only work with the host state, not with service states.

The host Host2 has to be down for Host1 to be unreachable.

I think you misread @r.sander. The Hosts 3 and 4 where DOWN, so Host 2 never should have alerted in the first place.

Assuming your configuration of parents ins flawless @Corvar, I am at a loss here.

Is the description in the first post in the sequence of the events?

Host2 never went down but sent a service notification first.

Host3 and Host4 then went down.

Host1 sents out a down notification because Host2 is still up. Only the direct parent host is relevant.

In my past dealings with Nagios, that isn’t what should happen. Specifically, Host1 shouldn’t send a Down notification because it and Host2 should both be in state Unreachable.

https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/networkreachability.html

I guess it is possible that when Host3 and Host4 went down, it triggered a check of all children and Host2 responded (so stayed in Up) and then Host1 wouldn’t be Unreachable.

This is what I meant to say. If the host state of Host2 is still UP, Host1 will not become UNREACH.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.