Is it possible to auto acknowledge problems (specific offline hosts)?

I am monitoring some hosts which aren’t online all the time. But it should be directly visible if they have the state DOWN, so I don’t want to add a rule which sets them permanently to UP. Instead I like to avoid producing problems which needs to be acknowledged.

Is it possible to “Auto-acknowledge all host problems of hosts …” or something like “Ignore problems of hosts …”?

At the moment I solved it by setting them into a 10 year downtime. This works, but does not feel right.

Hi,

so should the hosts ever notify anyone or is the goal that they be set to auto-acknowledge/downtime anyway?

the 10 year downtime doesn’t sound too wrong for me so far :slight_smile:
Gerd

I never tested that:

You can add a time period 00:00 - 00:00 and the use it in the rule Notification period for hosts.

May you give it a try and let us know

Anytime you can exclude your hosts in the notification rules but I guess you know that

No

Yes. The target is: “Ok there are hosts DOWN, but don’t treat them as an unhandled problem.”

I tried this, but it’s still treated as a problem although it is now “Out of notification period”:

image

Translating the state to “UNREACHABLE” doesn’t work, too:

“Out of service period” doesn’t work, too:

image

I’m not sure, but I think I found the correct way, but it is interpreted wrong by CheckMK:

  • I changed the “Normal check interval for host checks” to 44 seconds
  • I changed the “Maximum number of check attempts for host” to 25000

By that every 110 seconds (44 seconds x 2,5 intervals) a Host State Change is triggered and as the maximum is 25000 its host state changes from “SOFT (DOWN)” to “HARD (DOWN)” after around one month (25000 seconds x 110 seconds = 31.8 days).

By that the host check attempt counts upwards as expected:

But although the Host state is not “HARD (DOWN)” it is treated as a problem:

image

I mean why is a “SOFT (DOWN)” host a problem which needs action by the user? Sounds like a bug to me.

Its the standard Nagios behavior. First its an soft alert and after reaching max check attempts it change to a hard state. A hard state trigger a notification.

Thats what it is doing. What are you missing?

Yes, you can do this with an alert handler and livestatus.

Can you explain in detail what you expect? What do you want to “ignore”

Nothing. I only wanted to be clear that this is the desired state and I don’t want to translate it through a rule to always UP.

CheckMK should not treat a DOWN host as a problem. Not sure how to explain it different ^^

Or maybe the workflow as an example:

Step 1: Check if there are any unhandled problems
image

Step 2: Check problems, solve them or acknowledge them with a comment
image

Yes, but the soft state already triggers an unhandled problem, which I think is incorrect.

Hmm… So I need to create a BASH script which sends a request to the CheckMK API? Wouldn’t be my favorite solution as not doable by all team members.

OK, better understand you now.

In the dashlet Host Statistics you can add a filter to show only hard states:

Just make a clone of your main dashboard and share it with your colleagues.

I am currently dont know the way to modify the Host Statistics in the sidebar.

Ok, this means I could replace the default “hostproblems” view with a custom view which contains an additional filter. If this is possible, it’s worth a try.

Hmm, this could cause confusion in the team “Ok, there are still problems left … loading list … no problems found”. “Hey mgutt, solve this bug in the sidebar” :sweat_smile:

Do you now confirm my opinion regarding “the soft state already triggers an unhandled problem, which I think is incorrect”? Then I would open a feature request regarding this.

As a remedy you may disallow this sidebar snapin in the user role. This way you force the user to use your custom dashboard. Maybe another community member knows a way to customize the snapin.

No I do not agree.
The behavior is exactly what we have since more than 20 years in nagios and the community is used that
way. Nevertheless in checkmk you can customize the views and dashboards to make it work as you expect.

I’d say … instead of fiddling with views, periods, soft/hard states, etc.:
just put these hosts in an “everlasting” downtime and all is fine. keep it simple.

On Enterprise edition, instead of a downtime for the next say 10 years, you could also create a “recurring downtime” rule for these hosts with say downtime for 1 month, repeating monthly.
Same result, but you (or your future colleagues) won’t have a surprise in 10 years from now in their then “legacy” monitoring system :wink:

1 Like

From my perspective i would just set disabled notification to this host/service and exclude them from the default view of host/service problems. Nearly as simple as a endless downtime but more the right way in my opinion. This also enables you to search for these host/services especially via some host tags. I guess they are handily built-in, if i am not wrong.