Best practice on handling notifications and severities

zaxxon · January 9, 2020, 9:54am

Hello,

I am new to check_mk and have following questions I couldn’t find an answer to. At the moment we are fiddling around with check_mk EE demo.

In our current monitoring solution (Nagios derivate) we have named all services after a special syntax that identifies them uniquely.
An event handler calls a script, that checks them vs. a central text file (like a big table), approx 3k rows where we can set any incoming criticality a plugin provides (Nagios exit codes like WARNING, CRITICAL) to a new one or just keep it like it is, before we pass this notification with a message client to our event management software on a remote host. That’s quite fast and the script also check syntax or for duplicates etc. to avoid errors from editing.

This script and mapping file gives us an easy way to centrally handle severities per service (convert a critical to a minor so that no alert to a cell phone goes out for example) and also to which contact this will be routed. We also add a short message so that alerted customers get some more helpful info as in some cases the message from the plugins are not sufficient for the user that gets the alert.

I got it so far, that we could handle routing to customers just by contact groups.
For setting criticalities per service I did not find a way in check_mk.
We have about 1.5k hosts with 3k unique services that relate to approx. 60k services on the hosts. We look for a central way similar to that mapping file to handle criticality per service-id (those 3k lines mentioned in the mapping file).

To feed the script that checks the mapping file, every notification triggers the script via event handler in the Nagios derivate and by this hands over some parameters like the unique name of the service, the severity, message etc.

If there would be some mechanism inside check_mk to handle this, we could stop using the external script and mapping file.

Else we also found the alert handlers in check_mk and added a script in the appropriate directory. But is it possible to pass variable information like a unique ID for a service and other information to such a called script?
Just in case that there is nothing similar like we use already built in check_mk, we would simply use check_mk’s alert handler and call the script and mapping file as we do in the current solution.

I hope that explanation was not too quirky.

Thanks for any hint in advance!

Cheers
Markus

ChristianM · January 9, 2020, 12:43pm

Hello Markus,
you will be able to write your own notification with the needed enrichment, if you need. This depend on your needs. In checkmk commonly the service names given bases on the used plugin or instance. You should look at the environment of the notification plugin wich vars you need.
You can put a small script to ~/local/schare/check_mk/notitications like:

#!/bin/sh
#Test Environment out
env >> /tmp/out

Now you can use this notification plugin to check your env. Now you can decide the way for enrichment.

Regards, Christian

zaxxon · January 9, 2020, 1:15pm

Hi Christian,

this helps, thanks a lot!

Cheers
Markus