Pacemaker Heartbeat CRM General - UNKN - check failed - please submit a crash report!

Hi,

After adding two nodes with pacemaker to be monitored both report warning: “Heartbeat CRM General UNKN - check failed - please submit a crash report!”
Clients running check-mk-agent 1.6.0p17. Server is running OMD - Open Monitoring Distribution 1.6.0p17.

Crash Report:
Exception KeyError (‘num_resources’)
Traceback File “/omd/sites/Koronea/lib/python/cmk_base/checking.py”, line 320, in execute_check
raw_result = check_function(item, determine_check_params(params), section_content)
File “/omd/sites/Koronea/share/check_mk/checks/heartbeat_crm”, line 194, in check_heartbeat_crm
if params[‘num_resources’] is not None:

What causes the check to throw this warning/error? How can I resolv this problem?

Few more informations below.

Agent output:
<<<check_mk>>>
Version: 1.6.0p17
AgentOS: linux
Hostname: srv-01-sapconfig01
AgentDirectory: /etc/check_mk
DataDirectory: /var/lib/check_mk_agent
SpoolDirectory: /var/lib/check_mk_agent/spool
PluginsDirectory: /usr/lib/check_mk_agent/plugins
LocalDirectory: /usr/lib/check_mk_agent/local
OnlyFrom: 10.1.1.39 127.0.0.1 #10.0.20.1 10.0.20.2


<<<heartbeat_crm>>>
Stack: corosync
Current DC: srv-01-sapconfig01 (version 1.1.19+20181105.ccd6b5b10-3.16.1-1.1.19+20181105.ccd6b5b10) - partition with quorum
Last updated: Wed Sep 9 10:47:47 2020
Last change: Fri Aug 21 21:22:12 2020 by hacluster via crmd on srv-01-sapconfig01
2 nodes configured
11 resources configured
Online: [ srv-01-sapconfig01 srv-01-sapconfig02 ]
Full list of resources:
Clone Set: cl-storage [g-storage]
_ Started: [ srv-01-sapconfig01 srv-01-sapconfig02 ]
Clone Set: cl-nfsserver [nfsserver]
_ Started: [ srv-01-sapconfig01 srv-01-sapconfig02 ]
Resource Group: grp_NFS
_ rsc_VG_vgnfs (ocf::heartbeat:LVM): Started srv-01-sapconfig01
_ rsc_IP_nfs (ocf::heartbeat:IPaddr2): Started srv-01-sapconfig01
_ rsc_FS_vgnfs (ocf::heartbeat:Filesystem): Started srv-01-sapconfig01
_ exportfs (ocf::heartbeat:exportfs): Started srv-01-sapconfig01
stonith-sbd (stonith:external/sbd): Started srv-01-sapconfig01

It looks like that the inventory function has not created the correct parameters for your discovered services.
Can you make a test on the command line please.
cmk --debug -vvII hostname
after this was done without error can you have a look at the created autocheck file in
~/var/check_mk/autochecks/hostname.mk
Inside this file you should see in the line with the “heartbeat_crm” check also a parameter entry.
Is this so?

Running test cmk --debug -vvII hostname completed without error.

Content of thie file hostname.mk looks ok.
"[

  • {‘check_plugin_name’: ‘cpu.loads’, ‘item’: None, ‘parameters’: cpuload_default_levels, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘cpu.threads’, ‘item’: None, ‘parameters’: {}, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘df’, ‘item’: u’/’, ‘parameters’: {}, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘df’, ‘item’: u’/boot’, ‘parameters’: {}, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘df’, ‘item’: u’/srv/nfs’, ‘parameters’: {}, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘df’, ‘item’: u’/tmp’, ‘parameters’: {}, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘df’, ‘item’: u’/var’, ‘parameters’: {}, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘diskstat’, ‘item’: u’SUMMARY’, ‘parameters’: diskstat_default_levels, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘heartbeat_crm’, ‘item’: None, ‘parameters’: {‘num_resources’: 11, ‘num_nodes’: 2}, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘heartbeat_crm.resources’, ‘item’: u’cl-nfsserver’, ‘parameters’: None, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘heartbeat_crm.resources’, ‘item’: u’cl-storage’, ‘parameters’: None, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘heartbeat_crm.resources’, ‘item’: u’grp_NFS’, ‘parameters’: None, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘heartbeat_crm.resources’, ‘item’: u’stonith-sbd’, ‘parameters’: None, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘kernel’, ‘item’: u’Context Switches’, ‘parameters’: kernel_default_levels, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘kernel’, ‘item’: u’Major Page Faults’, ‘parameters’: kernel_default_levels, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘kernel’, ‘item’: u’Process Creations’, ‘parameters’: kernel_default_levels, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘kernel.util’, ‘item’: None, ‘parameters’: {}, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘lnx_if’, ‘item’: u’2’, ‘parameters’: {‘state’: [‘1’], ‘speed’: 10000000000}, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘mem.linux’, ‘item’: None, ‘parameters’: {}, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘mounts’, ‘item’: u’/’, ‘parameters’: [u’attr2’, u’inode64’, u’noquota’, u’relatime’, u’rw’], ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘mounts’, ‘item’: u’/boot’, ‘parameters’: [u’data=ordered’, u’relatime’, u’rw’], ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘mounts’, ‘item’: u’/srv/nfs’, ‘parameters’: [u’attr2’, u’inode64’, u’noquota’, u’relatime’, u’rw’], ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘mounts’, ‘item’: u’/tmp’, ‘parameters’: [u’attr2’, u’inode64’, u’noquota’, u’relatime’, u’rw’], ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘mounts’, ‘item’: u’/var’, ‘parameters’: [u’attr2’, u’inode64’, u’noquota’, u’relatime’, u’rw’], ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘ntp.time’, ‘item’: None, ‘parameters’: {}, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘postfix_mailq’, ‘item’: u’’, ‘parameters’: {}, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘postfix_mailq_status’, ‘item’: u’’, ‘parameters’: None, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘systemd_units.services_summary’, ‘item’: u’Summary’, ‘parameters’: {}, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘tcp_conn_stats’, ‘item’: None, ‘parameters’: tcp_conn_stats_default_levels, ‘service_labels’: {}},*
  • {‘check_plugin_name’: ‘uptime’, ‘item’: None, ‘parameters’: {}, ‘service_labels’: {}},*
    ]
    "

This looks ok - “num_resources” is existing here as parameter.
If you do a “cmk --debug -vvn hostname” on the command line do you get the same error as before?

Yes,

"Traceback (most recent call last):

  • File “/omd/sites/Koronea/bin/cmk”, line 100, in *
  • exit_status = modes.call("–check", None, opts, args)*
  • File “/omd/sites/Koronea/lib/python/cmk_base/modes/init.py”, line 72, in call*
  • return mode.handler_function(handler_args)
  • File “/omd/sites/Koronea/lib/python/cmk_base/modes/check_mk.py”, line 1579, in mode_check*
  • return checking.do_check(hostname, ipaddress, options.get(“checks”))*
  • File “/omd/sites/Koronea/lib/python/cmk_base/decorator.py”, line 58, in wrapped_check_func*
  • status, infotexts, long_infotexts, perfdata = check_func(hostname, *args, *kwargs)
  • File “/omd/sites/Koronea/lib/python/cmk_base/checking.py”, line 111, in do_check*
  • _do_all_checks_on_host(sources, host_config, ipaddress, only_check_plugin_names)*
  • File “/omd/sites/Koronea/lib/python/cmk_base/checking.py”, line 247, in _do_all_checks_on_host*
  • service.description)*
  • File “/omd/sites/Koronea/lib/python/cmk_base/checking.py”, line 320, in execute_check*
  • raw_result = check_function(item, determine_check_params(params), section_content)*
  • File “/omd/sites/Koronea/share/check_mk/checks/heartbeat_crm”, line 194, in check_heartbeat_crm*
  • if params[‘num_resources’] is not None:*
    KeyError: ‘num_resources’
    "

Anyone any ideas? Cannot find any solution for this :frowning:

Happens the error also if you do a “cmk --debug -vvII hostname” for the affected host?
The “params[‘num_resources’]” value is set at discovery time. If it is missing then something went wrong with the discovery of the host.