Change status of service to critical when Check_Mk service is critical

Hello Folks,
I would like to achieve that some of the services which are retrieved by the CheckMK Agent change their status to Critical when the CheckMK Agent is not able to retrieve the data anymore. In this case the Check_MK status changes to critical but all of his services stay OK and go to stale. Is there a possibility to change the status of those services when they be in stale?

Background:
we have a Dashboard for our customers which displays only their services in which they be interested to see and not necessary all the host and all his services. Iā€™m able to configure in the dashboard to show only CRIT services but not stale as well.

Sombeody a idea how I could achieve that?

Many thanks in advance!

There is an option to show stale services as well. In your dashboard, edit the corresponding element and add a context/search filter for services ā€œservice is staleā€, set it to ignore.

You could create an alert handler (in cee) or notification (in cre) script that triggers if the Check_MK service is CRIT.

Then this alert handler could lookup all passive services of that hosts via livestatus and
set them to e.g. CRIT also via livestatus.

I personally would prefer UNKNOWN instead of CRITICAL.

Ingredients:

But that is just a creative untested idea.

:warning: But be aware, this could create a lot of unwanted additional alerts and notifications.

This works for me

OMD[mysite]:~$ cat local/share/check_mk/alert_handlers/auto-unknown-passive-checks.sh 

#!/usr/bin/env bash
# Set all passive services to UNKNWON


# set input field separator to newline
IFS="
"

# exit if some conditions are not met
# is exit 1 the right thing to do?
[ -n $ALERT_HOSTNAME ] || exit 1
[ "$ALERT_SERVICEDESC" = "Check_MK" ] || exit 1
[ "$ALERT_SERVICESTATE" = "CRITICAL" ] || exit 1
[ "$ALERT_WHAT" = "SERVICE" ] || exit 1
[ "$ALERT_ALERTTYPE" = "STATECHANGE" ] || exit 1
[ "$ALERT_HOSTNOTIFICATIONNUMBER" = "1" ] || exit 1

# we need a unix timestamp
now=$(date +%s)

# lookup the passive services
lq "GET services\nColumns: description\nFilter: active_checks_enabled = 0\nFilter: host_name = $ALERT_HOSTNAME\n" |
  sort |
  while read service; do
    # https://assets.nagios.com/downloads/nagioscore/docs/externalcmds/cmdinfo.php?command_id=114
    # PROCESS_SERVICE_CHECK_RESULT;<host_name>;<service_description>;<return_code>;<plugin_output>
    echo "COMMAND [$now] PROCESS_SERVICE_CHECK_RESULT;$ALERT_HOSTNAME;$service;3;UNKNOWN set by alert handler" | lq
  done

Note: perhaps itā€™s even better to implement the above logic as inline python instead of using an expensive bash script:

Checkmk provides the option to write alert handlers as Python functions, which then run inline - without process creation. Alert handlers - Responding to problems automatically

If I fake a CRITICAL check result for the Check_MK service, or disable the agent controller on that host to provoke a CRITICAL Check_MK service all passive checks are set to UNKNOWN:

root@jammy# systemctl stop cmk-agent-ctl-daemon.service

And they are also getting stale after a while but I like UNKOWN&stale better than OK&stale

1 Like

Hi Janncek,
I have tried it, but when I have a Dashboard with the following setting: Service hard states only ā€œCRITā€ and Service Stale to ā€œignoreā€ I get only the CRIT stale services, but I would like to see all CRIT services and all stale services.

@mimimi
Thank you very much for you afford. As you have mentioned this solution brings many unwanted notifications because of the service state manipulation.

I would like to see in my single dashboard view only CRIT services and only stale services (no matter in which state they are).

Iā€™m missing in the Dashboard Search Filter at Service ā†’ Service Hard States ā†’ basically a additional stale option.

Is there a option how else I could achieve it?

Is this filter option the right one?
image

In all the editions except raw, which I got, there is a rule called Service state translation (also Host state translation) that can make a state change to something else based on conditions. Maybe that rule can help you out to change Stale into Critical.
Since Stale isnā€™t really a state, but more a ā€˜modifierā€™ on an existing state, it might not work. But it doesnā€™t hurt to try.

Basically you want to combine the results of two different filtered views together. One view with all states with ā€˜modifierā€™ Stale set to yes, the other view with the bad states but with Stale set to no (to avoid double entries).

To combine the two views, perhaps Business Intelligence can help.

On topic: this is what @Janncek suggested earlier. (-;

Off topic: your avatar with Commander Keen just gave me a flash of fond memories of the game. (-:

Thanks for your solution. Indeed its a similar solution to @mimimi. The problem with this solution is: it will potentially produce a good amounts of alerts. So the meaning behind stale is a good intention which I not wanna miss, but I would like to display only stateless Stale Services with Services in State Critical in one view. But I guess its not achievable in one view.

I have now the idea to make a counter for Stale Services. What is the best way to make a counter for filtered services in stale?

Iā€™m open to any idea since my Dashboard skills are not great yet.

If you use the sidebar in your interface, the sidebar element Overview automagically show stale counters when specific stale hosts or services are unhandled (read: not set in downtime or such).

The builtin view named uncheckedsvc shows stale services as a list, which can be found here: Monitor > Problems > Stale services. Same condition as sidebar counter.

Using RAW myself, the dashboard elements are very limited, so my Dashboards skill are not great either. Unable to test specific element, I assume you can use Metric > Single metric and use service Service is stale to get the dashboard counter you seek.

Edit: Thought I could possibly test it out in the Checkmk Playground, but sadly not allowed to setup / customize / edit stuff there. :face_with_raised_eyebrow: