Slurm grid checks

I’ve written (with some help from gemini.google and the check_mk chatbot) a plugin to add some slurm grid checks on my slurm controller node. The output in check_mk_agent looks like:

<<<slurm>>>
0 slurm_node_blade-04-01 - OK - blade-04-01 is idle
0 slurm_node_blade-04-02 - OK - blade-04-02 is idle
0 slurm_node_blade-04-03 - OK - blade-04-03 is idle
<snipped out 89 other nodes>
0 slurm_node_states idle_nodes=91|mixed_nodes=0|allocated_nodes=0|down_nodes=0|other_nodes=0 - OK - All nodes are idle, mixed or allocated.
0 slurm_slurmctld_service - OK - slurmctld on gridboss is active.

That’s working nicely. Next step is to get check_mk to see the section and inventory the services. That’s where I’m getting stuck. The python code that I’m getting out of gemini and chatbot seems okay but running a new inventory on the client never shows any slurm services.

Eventual finish line goal is to have all the nodes monitored for their state within the slurm grid. Keep an eye on the slurmctld service, and provide some perf data about states in a color codes stacked area graph.

My first run at this was with a local check that worked but I couldn’t get the graph part of it working. Always ended up with separate graphs for the five metrics. Decided to try and work up a plugin to see if that would yield different results and hit this roadblock.

Gonna revert to the local checks for now and hope somebody can tell me where I’m going wrong.

The current version that I’ve gotten to is:
/opt/omd/sites/cmk7309/local/lib/python3/cmk_addons/plugins/slurm/agent_based/slurm.py
slurm.py (1.7 KB)

The site is 2.4.0p11.cre running on an Alma Linux 9.6 system.

Unless I’m missing something I’m not gonna be able to do this without either moving up from the CRE to one of the paid versions. That’s not likely an option due to budget constraints. Or by setting up grafana and hooking it into check_mk.

I was able to change the colors on the graphs but can’t get the combined graph to work.

Hi @BiloxiGeek ,

We plan a DevHour around November, so if there are no answers till then, maybe you could bring it up then? I know it is not soon, so not a perfect solution, but maybe it would help.

I did get grafana setup on the same cmk system. It’s working fairly well so far.

Have to link out to external resources to see the graph but that’s not a big deal at this point.