After adding two nodes with pacemaker to be monitored both report warning: “Heartbeat CRM General UNKN - check failed - please submit a crash report!”
Clients running check-mk-agent 1.6.0p17. Server is running OMD - Open Monitoring Distribution 1.6.0p17.
Crash Report:
Exception KeyError (‘num_resources’)
Traceback File “/omd/sites/Koronea/lib/python/cmk_base/checking.py”, line 320, in execute_check
raw_result = check_function(item, determine_check_params(params), section_content)
File “/omd/sites/Koronea/share/check_mk/checks/heartbeat_crm”, line 194, in check_heartbeat_crm
if params[‘num_resources’] is not None:
What causes the check to throw this warning/error? How can I resolv this problem?
Few more informations below.
Agent output: <<<check_mk>>> Version: 1.6.0p17 AgentOS: linux Hostname: srv-01-sapconfig01 AgentDirectory: /etc/check_mk DataDirectory: /var/lib/check_mk_agent SpoolDirectory: /var/lib/check_mk_agent/spool PluginsDirectory: /usr/lib/check_mk_agent/plugins LocalDirectory: /usr/lib/check_mk_agent/local OnlyFrom: 10.1.1.39 127.0.0.1 #10.0.20.1 10.0.20.2 … … <<<heartbeat_crm>>> Stack: corosync Current DC: srv-01-sapconfig01 (version 1.1.19+20181105.ccd6b5b10-3.16.1-1.1.19+20181105.ccd6b5b10) - partition with quorum Last updated: Wed Sep 9 10:47:47 2020 Last change: Fri Aug 21 21:22:12 2020 by hacluster via crmd on srv-01-sapconfig01 2 nodes configured 11 resources configured Online: [ srv-01-sapconfig01 srv-01-sapconfig02 ] Full list of resources: Clone Set: cl-storage [g-storage] _ Started: [ srv-01-sapconfig01 srv-01-sapconfig02 ] Clone Set: cl-nfsserver [nfsserver] _ Started: [ srv-01-sapconfig01 srv-01-sapconfig02 ] Resource Group: grp_NFS _ rsc_VG_vgnfs (ocf:LVM): Started srv-01-sapconfig01 _ rsc_IP_nfs (ocf:IPaddr2): Started srv-01-sapconfig01 _ rsc_FS_vgnfs (ocf:Filesystem): Started srv-01-sapconfig01 _ exportfs (ocf:exportfs): Started srv-01-sapconfig01 stonith-sbd (stonith:external/sbd): Started srv-01-sapconfig01
It looks like that the inventory function has not created the correct parameters for your discovered services.
Can you make a test on the command line please.
cmk --debug -vvII hostname
after this was done without error can you have a look at the created autocheck file in
~/var/check_mk/autochecks/hostname.mk
Inside this file you should see in the line with the “heartbeat_crm” check also a parameter entry.
Is this so?
This looks ok - “num_resources” is existing here as parameter.
If you do a “cmk --debug -vvn hostname” on the command line do you get the same error as before?
Happens the error also if you do a “cmk --debug -vvII hostname” for the affected host?
The “params[‘num_resources’]” value is set at discovery time. If it is missing then something went wrong with the discovery of the host.
What do you see behind the service if you do a “cmk -D hostname”?
There you should see the parameters for this service.
The minimum you should see there is something like "max_age": 60 but you must see also "num_nodes": ... and "num_resources": ....
It looks like you have a old rule inside your system what sets the parameters as Tuple and not as dictionary.
You can grep through your ~/etc/check_mk directory and search for heartbeat_crm settings.