How to write CheckMK Checks

Hi everyone,

I know there are not many options to make the topic even more generous, but I did that purposefully.

I am now working multiple days on the first baby steps of writing a CheckMK plugin, but I do not even get to a point where I succeed even with hardcoded scripts; not even starting to think about appropriate WATO integration …
All resources I have found so far are anything between the lines of …

  • lookup examples in CheckMK Exchange
  • look at the Sphinx-docs (shipped with any instance of CheckMK), which are very sparse
  • lookup something here in the forum

And that’s what CheckMK calls “Developer Docs” - so far! Recent Live-Stream events have shown, that even the CheckMK internal Devs struggle by missing straightforward docs (see Plug-in Migration Livestream Announcement or Developer Hour: New Plug-in Developer APIs Announcement).

What I’d love to see in this topic is an up-to-date, end-to-end summary of what must be done to develop a CheckMK plugin; there are no references to some 2017 Articles or reverse-engineering from CheckMK Exchange.
It’s not that I am too lazy to read; it’s that I can’t get to any result and am constantly wondering how exactly multiple things are meant.

I’d like to achieve the following with this thread: Learning to write a Check Plugin, which sends an HTTP Query to some Web-API and returns the number of lines returned from that API as a metric. The required Authentication Token to access the resource should come from CheckMK’s Password-Store and apart from that, only the URL of the remote API should be configurable in the WebUI.

Doesn’t sound too exotic, does it?
Pure-Python code, without any CheckMK would look like this:

#!/usr/bin/env python3
import httpx
from typing import Tuple
from argparse import ArgumentParser, Namespace

def parse_args() -> Namespace:
    parser = ArgumentParser(
        description="Part of CheckMK Plugin 'dhcpac_api'.",
    )
    parser.add_argument(
        "-u", "--url",
        help="The URL of the API to query.",
        action="store",
        required=True,
    )
    parser.add_argument(
        "-t", "--token",
        help="The Token to use to auth against the API.",
        action="store",
        required=True,
    )
    return parser.parse_args()


def get_records_from_api(url: str, token: str) -> Tuple[int, int]:
    headers = {"Authorization": f"Bearer {token}"}
    response = httpx.get(url, headers=headers)
    status_code = response.status_code

    record_count = 0
    if status_code == 200:
        try:
            # Remove first line, since this is a CVS Header and not a real record
            record_count = len(response.content.decode().split()[1:])
        except (UnicodeDecodeError, IndexError, UnicodeError):
            pass

    return status_code, record_count

def main():
    args = parse_args()
    status_code, record_count = get_records_from_api(url=args.url, token=args.token)
    print(f"{status_code};{record_count}")

if __name__ == "__main__":
    main()

Nothing too exotic.

From what I think I learned from the Developing extensions for Checkmk resource, is that a Special Agent would be the best fit, since these are executed on the CheckMK server and don’t require an agent to be enrolled anywhere.

Here comes the first question already: Is that correct? Or is a " Native agent-based check plug-ins" or “Local check” a better match? If so: Please explain why.

Based on the new CheckMK DevAPI v2, I understood that such a 3rd party Special Agent would have to be placed in ~/local/lib/python3/cmk_addons/plugins/<FAMILY_NAME> as it’s main path.
This is already the 2nd question: Is that the correct location? If not: Which is?

Studying the Sphinx-Docs, a Special Agent has to be put in the subfolder of that main path, named server_side_calls and within that, a script, which needs to define an object that has a name beginning with special_agent_ and needs to be an instance of either SpecialAgentConfig() or SpecialAgentCommand().
Both expect a script in the sub-folder libexec of the main path, which is named agent_+“name”-field of SpecialAgentConfig()/SpecialAgentCommand(); so, for example:

special_agent_dhcpac_api = SpecialAgentConfig(
    name="dhcpac_api",
    ...
)

would expect an executable script at ~/local/lib/python3/cmk_addons/plugins/<FAMILY_NAME>/libexec/agent_dhcpac_api.

Even more questions so far: Is all of this correct?

Before I make this Monster of a post even more complicated, I’ll pause here and wait for any feedback on what I wrote so far, to not brabble more nonsense, in case anything was wrong up until here.

I’m looking forward to your answers! After I have sorted everything out, I will make a blog post from the summary, hopefully helping others as well.

1 Like

Hi @The-Judge

when it comes to resources, have you also looked at the actual docs?

With this requirement you have correctly only the option special agent or classic Nagios check.
At the moment i have only one simple special agent and check.

Here you see what is needed for such a construct

  • checks (agent_based)
  • stub file for special agent (libexec)
  • WATO rules to configure and assign the agent (rulesets)
  • script to build the agent call (server_side_calls)
  • special agent itself (special_agents)

This is the general problem with special agents - you need way more things than a simple SNMP check.

3 Likes

Hi Elias,

thanks for your reply!

Yes, I have! And that’s exactly the problem: You are pointing at the only docs available and none of them explains for a Developer, what has to be done exactly:

Developing extensions for Checkmk

Only explains that there are different types of checks and leads to the conclusion, that for the example I provided, a special agent is what you want. A Sub-Link to Datasource programs - Monitoring devices without access to an operating system describes in a bit more detail what a special agent is and how it works from a user-perspective (persons, who want to configure an existing special agent based check). But it gives a developer no clues about how plugins are written with the solely exception about the “correct” location to save custom special agent checks: ~/local/share/check_mk/agents/special/.
But the funny thing is: Even this tiny bit of a first idea seems to be wrong, since with DevAPI 2 and recent changes shipped with CMK 2.3 and plans for 2.4 announced, other resources say it has to be in ~/local/lib/python3/cmk_addons/plugins/<FAMILY_NAME>/server_side_calls/whatever.py AND ~/local/lib/python3/cmk_addons/plugins/<FAMILY_NAME>/libexec/<SPECIALAGENTCONFIG_NAME>.

Writing agent-based check plug-ins

Only is about agent based Plugins, which work entirely different than special agent plugins.

And as I already said: The special agent docs rely largely on declaring agent_netapp_ontap as a reference/example rather than providing any developer documentation.

So: Where are the docs in this? How can anyone think this is even close to a sufficient developer documentation?

1 Like

Only is about agent based Plugins, which work entirely different than special agent plugins.

Here you need to start your investigation by getting an understanding how Checkmk works. Agent and Special agent do not work entirely different but are executed in a different place. Agents are (usually) executed on the monitored system while special agents are executed from the outside of the monitored system. E.g. by calling an API to fetch the data.
The check plugin is not aware if the data is provided by the “regular” agent or a special agent.

How can anyone think this is even close to a sufficient developer documentation?

I guess, you’re referring to a beginner documentation. As you already (correctly) stated, there is a developer documentation in Checkmk itself for the server side call API. This reference documentation even includes a quick guide containing a skeleton for your first server side call implementation.
The docs are for entry users to give them a more guided way to discover the possibilities of these APIs. They’re purpose is not to replace (or duplicate) the Sphinx docs reference.
An here we’re working on an article that gives a more guided help for an example special agent or active check. Please notice, that these articles will not have the entry level style as we have this for the check plugin articles but aim for more experienced Python developers.

I understand that the current situation is often confusing. But this is for the sake of more consistency and reliability of our APIs and Software. Once the changes are more settled, we all will profit from these changes!
Meanwhile, we will continue to add more guides for our new and modified APIs.

3 Likes

I have to side with @The-Judge here.

The new API documentation is very welcome. Also the check_mk docs provide valuable information. However it lacks some overview of the various components of different check types. How do they tie together and with check_mk core itself.

E.g there is mention of executables that they go to the libexec folder. But there is no mention about that here it is were the actual special_agent is meant to be placed and that it needs to output the fetched data in the same format as a regular agent installed on the monitored system to stdout.

Because of this I’m finding myself still experimenting, trial and error, until I get the interaction between the components.

I would also welcome some kind of doc for the API of the check_mk core, where we could for e.g get the configuration data of a host. Some info on how to get use of the snmp context feature. Nowadays almost every vendor has some hardware virtualization technology, and intericting with the virtual resource often requires contexts. I couldn’t get it working with the currently available information.

Maybe these things are naturally understandable for an experienced developer, however I think the people developing plugins are mostly system owners who would like to get their stuff monitored as they wanted to.

Thank you for your work

1 Like