Debugging CheckMK notification plugins

Hi everyone,

I am currently trying to write a custom notification plugin for cmk. It should send the notification via REST to another application.

I am having trouble finding “real” documentation for exactly that kind of thing - I started by:

  1. copying share/check_mk/notifications/servicenow to local/share/check_mk/notifications/myservicenow
  2. Changing line 3 from # Servicenow (Enterprise only) to # My Servicenow example
  3. Added the following to the end of the file:
print("outside: about to check for __main__ == %s" % __name__)
if __name__ == "__main__":
    __pickle(str("a"), name="__main__")
    print("outside: about to call main()")
    sys.exit(main())

I got these instructions from This KB article.
The thing is: Generally, it seems to work. But, as it is with developing something new, I must change stuff quite regularly, re-run it, see the results, repeat.

I am facing the following issues with this (aka. “My Questions” :wink: ):

I added a function in my notification plugin, which creates a sub-folder using pathlib.Path.mkdir() (in /tmp), which uses pickle to save the request and json objects into files to have a chance to analyse what’s going on:

def __get_debug_path(marker: str = None): # noqa
    from pathlib import Path
    debug_dir = Path("/tmp/cmk_myservicenow")
    debug_dir.mkdir(exist_ok=True)
    if marker is not None:
        with open(debug_dir.joinpath('%s.marker' % marker), 'w') as _:
            pass
    return debug_dir
def __pickle(obj, name: str): # noqa
    import pickle
    with open(__get_debug_path(name).joinpath("%s.pickle" % name), "wb") as f:
        pickle.dump(obj, f)

From the main() function, I then call this multiple times for interesting object extraction:

  • __pickle(context, name="context")
  • __pickle(context.json_for_event, name="context_json_for_event")
  • __pickle(response, name="response")

The idea behind that is clear - it just does not happen: Neither the folder in /tmp is created, nor are the pickle files written. At the same time, also no exception is thrown or at least I do not find it in the logs.

I then tried to find the reason for this not working, changed lots of stuff, even re-started the site multiple times, and deleted all __pycache__ folders, … During my search, I even changed the path where the files are written to. After hours, at least the folder was created, but to the path which was previously in there and not the current one … :wink:
It feels like there is some really weird caching going on or the updated script is executed by another satellite, which does not always receive the latest version of the script.

Long story short:
I don’t understand how cmk executes these notification plugins:

  • Is it executing them as a script? Or does it import specific classes and functions only?
  • Where do these scripts get their parameters from for the content of the notifications?
  • Which node is executing these notification plugins? Is that done by the central cmk instance or is the satellite which this node is linked to executing it?
  • What logs are potential notification script exceptions written to?
  • Is there anything to mind for when changing a notification script? I think I already identified that it is required to change something in the notification rules (like the Comment field), to trigger pending changes for all sites, so that the latest version of the script is distributed via configuration replication. Is there anything else that needs to be taken care for?

You may want to read about notification plugins in the documentation: Notifications - via Email, SMS, ticket system and more

Thanks for pointing that one out - somehow I managed to overlook this.

So, it answers the questions:

Q: Is it executing them as a script? Or does it import specific classes and functions only?
A: Yes, as a script.

Q: Where do these scripts get their parameters from for the content of the notifications?
A: From environment variables prefixed NOTIFY_.

Q: What logs are potential notification script exceptions written to?
A: Standard Output is written to ~/var/log/notify.log (I guess, this includes standard error, where the exceptions go, too, but I haven’t checked yet).

But it leaves the remaining questions unanswered:

  1. Which node is executing these notification plugins? Is that done by the central cmk instance or is the satellite which this node is linked to executing it?
  2. Is there anything to mind for when changing a notification script? I think I already identified that it is required to change something in the notification rules (like the Comment field), to trigger pending changes for all sites, so that the latest version of the script is distributed via configuration replication. Is there anything else that needs to be taken care for?
1 Like

OK, I will answer my own questions for the sake of completeness:

  1. Which node is executing these notification plugins? Is that done by the central cmk instance or is the satellite which this node is linked to executing it?
  2. Is there anything to mind for when changing a notification script? I think I already identified that it is required to change something in the notification rules (like the Comment field), to trigger pending changes for all sites, so that the latest version of the script is distributed via configuration replication. Is there anything else that needs to be taken care for?

Answer to 1:
It depends on the site-specific global settings in “Notifications”. Per default, it is executed locally. But it can also be configured to delegate it to a central site. The notification is always created by the core of that site, which is configured to monitor the problematic host. So: Yes, if you have a distributed monitoring setup with multiple satellites nodes, the script may be executed on one of those nodes, in which case you won’t find debugging files created by your scripts on the central node.

Answer to 2:
Nothing besides what’s in the question already: If the script is executed on the central node, there’s nothing to do when the script is in place.
Again, if you have multi-node distributed monitoring, the changed script must be copied to the node executing the notification. Besides other things, this is done when applying changed notification configs; changing the comment is sufficient to trigger this.

2 Likes