Cluster Checks crashing (Apache and MRPE)

CMK version: 2.1p6
OS version: current Appliance

After upgrading to 2.1 I see that cluster checks are crashing after I changed the mode to best or failover. I see it for all my mrpe-checks and interestingly one Apache-Cluster check.

Output of “cmk --debug -vvn hostname”:

OMD[checkmk]:~$ cmk --debug -vvn clusterhost
Checkmk version 2.1.0p6
Try license usage history update.
Trying to acquire lock on /omd/sites/checkmk/var/check_mk/license_usage/next_run
Got lock on /omd/sites/checkmk/var/check_mk/license_usage/next_run
Trying to acquire lock on /omd/sites/checkmk/var/check_mk/license_usage/history.json
Got lock on /omd/sites/checkmk/var/check_mk/license_usage/history.json
Next run time has not been reached yet. Abort.
Releasing lock on /omd/sites/checkmk/var/check_mk/license_usage/history.json
Released lock on /omd/sites/checkmk/var/check_mk/license_usage/history.json
Releasing lock on /omd/sites/checkmk/var/check_mk/license_usage/next_run
Released lock on /omd/sites/checkmk/var/check_mk/license_usage/next_run
+ FETCHING DATA
  Source: SourceType.HOST/FetcherType.PROGRAM
[cpu_tracking] Start [7f4de8bbc0a0]
[ProgramFetcher] Fetch with cache settings: DefaultAgentFileCache(host1.fqdn, base_path=/omd/sites/checkmk/tmp/check_mk/cache, max_age=MaxAge(checking=90, discovery=120, inventory=120), disabled=False, use_outdated=False, simulation=False)
Using data from cache file /omd/sites/checkmk/tmp/check_mk/cache/host1.fqdn
Got 71391 bytes data from cache
[ProgramFetcher] Use cached data
[cpu_tracking] Stop [7f4de8bbc0a0 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
  Source: SourceType.HOST/FetcherType.PIGGYBACK
[cpu_tracking] Start [7f4de8832580]
[PiggybackFetcher] Fetch with cache settings: NoCache(host1.fqdn, base_path=/omd/sites/checkmk/tmp/check_mk/data_source_cache/piggyback, max_age=MaxAge(checking=90, discovery=120, inventory=120), disabled=True, use_outdated=False, simulation=False)
Not using cache (Cache usage disabled)
[PiggybackFetcher] Execute data source
No piggyback files for 'host1.fqdn'. Skip processing.
No piggyback files for 'www.xxx.yyy.zzz'. Skip processing.
Not using cache (Cache usage disabled)
[cpu_tracking] Stop [7f4de8832580 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
  Source: SourceType.HOST/FetcherType.PROGRAM
[cpu_tracking] Start [7f4de87cd160]
[ProgramFetcher] Fetch with cache settings: DefaultAgentFileCache(host2.fqdn, base_path=/omd/sites/checkmk/tmp/check_mk/cache, max_age=MaxAge(checking=90, discovery=120, inventory=120), disabled=False, use_outdated=False, simulation=False)
Using data from cache file /omd/sites/checkmk/tmp/check_mk/cache/host2.fqdn
Got 71295 bytes data from cache
[ProgramFetcher] Use cached data
[cpu_tracking] Stop [7f4de87cd160 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
  Source: SourceType.HOST/FetcherType.PIGGYBACK
[cpu_tracking] Start [7f4de87cd340]
[PiggybackFetcher] Fetch with cache settings: NoCache(host2.fqdn, base_path=/omd/sites/checkmk/tmp/check_mk/data_source_cache/piggyback, max_age=MaxAge(checking=90, discovery=120, inventory=120), disabled=True, use_outdated=False, simulation=False)
Not using cache (Cache usage disabled)
[PiggybackFetcher] Execute data source
No piggyback files for 'host2.fqdn'. Skip processing.
No piggyback files for '84.23.226.58'. Skip processing.
Not using cache (Cache usage disabled)
[cpu_tracking] Stop [7f4de87cd340 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
+ PARSE FETCHER RESULTS
  Source: SourceType.HOST/FetcherType.PROGRAM
<<<check_mk>>> / Transition NOOPParser -> HostSectionParser
<<<cmk_agent_ctl_status:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<checkmk_agent_plugins_lnx:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<labels:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<df>>> / Transition HostSectionParser -> HostSectionParser
<<<df>>> / Transition HostSectionParser -> HostSectionParser
<<<systemd_units>>> / Transition HostSectionParser -> HostSectionParser
<<<nfsmounts>>> / Transition HostSectionParser -> HostSectionParser
<<<cifsmounts>>> / Transition HostSectionParser -> HostSectionParser
<<<mounts>>> / Transition HostSectionParser -> HostSectionParser
<<<ps_lnx>>> / Transition HostSectionParser -> HostSectionParser
<<<mem>>> / Transition HostSectionParser -> HostSectionParser
<<<cpu>>> / Transition HostSectionParser -> HostSectionParser
<<<uptime>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_if>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_if:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<tcp_conn_stats>>> / Transition HostSectionParser -> HostSectionParser
<<<diskstat>>> / Transition HostSectionParser -> HostSectionParser
<<<kernel>>> / Transition HostSectionParser -> HostSectionParser
<<<md>>> / Transition HostSectionParser -> HostSectionParser
<<<vbox_guest>>> / Transition HostSectionParser -> HostSectionParser
<<<postfix_mailq>>> / Transition HostSectionParser -> HostSectionParser
<<<postfix_mailq_status:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<job>>> / Transition HostSectionParser -> HostSectionParser
<<<ntp:cached(1657650408,120)>>> / Transition HostSectionParser -> HostSectionParser
<<<mrpe>>> / Transition HostSectionParser -> HostSectionParser
<<<mrpe>>> / Transition HostSectionParser -> HostSectionParser
<<<mrpe>>> / Transition HostSectionParser -> HostSectionParser
<<<mrpe>>> / Transition HostSectionParser -> HostSectionParser
<<<mrpe>>> / Transition HostSectionParser -> HostSectionParser
<<<local:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<yum:cached(1657614511,129600)>>> / Transition HostSectionParser -> HostSectionParser
<<<check_mk:cached(1657647346,3600)>>> / Transition HostSectionParser -> HostSectionParser
No persisted sections
  -> Add sections: ['check_mk', 'checkmk_agent_plugins_lnx', 'cifsmounts', 'cmk_agent_ctl_status', 'cpu', 'df', 'diskstat', 'job', 'kernel', 'labels', 'lnx_if', 'local', 'md', 'mem', 'mounts', 'mrpe', 'nfsmounts', 'ntp', 'postfix_mailq', 'postfix_mailq_status', 'ps_lnx', 'systemd_units', 'tcp_conn_stats', 'uptime', 'vbox_guest', 'yum']
  Source: SourceType.HOST/FetcherType.PIGGYBACK
No persisted sections
  -> Add sections: []
  Source: SourceType.HOST/FetcherType.PROGRAM
<<<check_mk>>> / Transition NOOPParser -> HostSectionParser
<<<cmk_agent_ctl_status:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<checkmk_agent_plugins_lnx:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<labels:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<df>>> / Transition HostSectionParser -> HostSectionParser
<<<df>>> / Transition HostSectionParser -> HostSectionParser
<<<systemd_units>>> / Transition HostSectionParser -> HostSectionParser
<<<nfsmounts>>> / Transition HostSectionParser -> HostSectionParser
<<<cifsmounts>>> / Transition HostSectionParser -> HostSectionParser
<<<mounts>>> / Transition HostSectionParser -> HostSectionParser
<<<ps_lnx>>> / Transition HostSectionParser -> HostSectionParser
<<<mem>>> / Transition HostSectionParser -> HostSectionParser
<<<cpu>>> / Transition HostSectionParser -> HostSectionParser
<<<uptime>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_if>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_if:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<tcp_conn_stats>>> / Transition HostSectionParser -> HostSectionParser
<<<diskstat>>> / Transition HostSectionParser -> HostSectionParser
<<<kernel>>> / Transition HostSectionParser -> HostSectionParser
<<<md>>> / Transition HostSectionParser -> HostSectionParser
<<<vbox_guest>>> / Transition HostSectionParser -> HostSectionParser
<<<postfix_mailq>>> / Transition HostSectionParser -> HostSectionParser
<<<postfix_mailq_status:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<job>>> / Transition HostSectionParser -> HostSectionParser
<<<ntp:cached(1657650415,120)>>> / Transition HostSectionParser -> HostSectionParser
<<<mrpe>>> / Transition HostSectionParser -> HostSectionParser
<<<mrpe>>> / Transition HostSectionParser -> HostSectionParser
<<<mrpe>>> / Transition HostSectionParser -> HostSectionParser
<<<mrpe>>> / Transition HostSectionParser -> HostSectionParser
<<<mrpe>>> / Transition HostSectionParser -> HostSectionParser
<<<local:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<yum:cached(1657617359,129600)>>> / Transition HostSectionParser -> HostSectionParser
<<<check_mk:cached(1657650155,3600)>>> / Transition HostSectionParser -> HostSectionParser
No persisted sections
  -> Add sections: ['check_mk', 'checkmk_agent_plugins_lnx', 'cifsmounts', 'cmk_agent_ctl_status', 'cpu', 'df', 'diskstat', 'job', 'kernel', 'labels', 'lnx_if', 'local', 'md', 'mem', 'mounts', 'mrpe', 'nfsmounts', 'ntp', 'postfix_mailq', 'postfix_mailq_status', 'ps_lnx', 'systemd_units', 'tcp_conn_stats', 'uptime', 'vbox_guest', 'yum']
  Source: SourceType.HOST/FetcherType.PIGGYBACK
No persisted sections
  -> Add sections: []
Received no piggyback data
Received no piggyback data
Received no piggyback data
Received no piggyback data
[cpu_tracking] Start [7f4de87dc700]
value store: synchronizing
Trying to acquire lock on /omd/sites/checkmk/tmp/check_mk/counters/clusterhost
Got lock on /omd/sites/checkmk/tmp/check_mk/counters/clusterhost
value store: loading from disk
Releasing lock on /omd/sites/checkmk/tmp/check_mk/counters/clusterhost
Released lock on /omd/sites/checkmk/tmp/check_mk/counters/clusterhost
[cpu_tracking] Stop [7f4de87dc700 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
Trying to acquire lock on /omd/sites/checkmk/var/check_mk/crashes/base/6ab8d45c-0210-11ed-a013-e63601754f21/crash.info
Got lock on /omd/sites/checkmk/var/check_mk/crashes/base/6ab8d45c-0210-11ed-a013-e63601754f21/crash.info
Releasing lock on /omd/sites/checkmk/var/check_mk/crashes/base/6ab8d45c-0210-11ed-a013-e63601754f21/crash.info
Released lock on /omd/sites/checkmk/var/check_mk/crashes/base/6ab8d45c-0210-11ed-a013-e63601754f21/crash.info
Traceback (most recent call last):
  File "/omd/sites/checkmk/bin/cmk", line 98, in <module>
    exit_status = modes.call("--check", None, opts, args)
  File "/omd/sites/checkmk/lib/python3/cmk/base/modes/__init__.py", line 69, in call
    return handler(*handler_args)
  File "/omd/sites/checkmk/lib/python3/cmk/base/modes/check_mk.py", line 1804, in mode_check
    checking.commandline_checking(
  File "/omd/sites/checkmk/lib/python3/cmk/base/agent_based/decorator.py", line 43, in wrapped_check_func
    status, output_text = _combine_texts(check_func(hostname, *args, **kwargs))
  File "/omd/sites/checkmk/lib/python3/cmk/base/agent_based/checking/__init__.py", line 121, in commandline_checking
    return _execute_checkmk_checks(
  File "/omd/sites/checkmk/lib/python3/cmk/base/agent_based/checking/__init__.py", line 174, in _execute_checkmk_checks
    num_success, plugins_missing_data = check_host_services(
  File "/omd/sites/checkmk/lib/python3/cmk/base/agent_based/checking/__init__.py", line 322, in check_host_services
    success = _execute_check(
  File "/omd/sites/checkmk/lib/python3/cmk/base/agent_based/checking/__init__.py", line 382, in _execute_check
    submittable = get_aggregated_result(
  File "/omd/sites/checkmk/lib/python3/cmk/base/agent_based/checking/__init__.py", line 470, in get_aggregated_result
    result = _aggregate_results(
  File "/omd/sites/checkmk/lib/python3/cmk/base/agent_based/checking/__init__.py", line 578, in _aggregate_results
    perfdata, results = _consume_and_dispatch_result_types(subresults)
  File "/omd/sites/checkmk/lib/python3/cmk/base/agent_based/checking/__init__.py", line 622, in _consume_and_dispatch_result_types
    for subr in subresults:
  File "/omd/sites/checkmk/lib/python3/cmk/base/agent_based/checking/_cluster_modes.py", line 145, in _cluster_check
    yield from summarizer.secondary_results(
  File "/omd/sites/checkmk/lib/python3/cmk/base/agent_based/checking/_cluster_modes.py", line 225, in secondary_results
    yield from (
  File "/omd/sites/checkmk/lib/python3/cmk/base/agent_based/checking/_cluster_modes.py", line 226, in <genexpr>
    Result(
  File "/omd/sites/checkmk/lib/python3/cmk/base/api/agent_based/checking_classes.py", line 367, in __new__
    state, summary, details = _create_result_fields(**kwargs)
  File "/omd/sites/checkmk/lib/python3/cmk/base/api/agent_based/checking_classes.py", line 405, in _create_result_fields
    raise ValueError(f"'{name}' must be non-empty str or None, got {var}")
ValueError: 'notice' must be non-empty str or None, got

The reason for the apache-cluster was that the site had the name [::1] - after changing the name it worked

ok…that wasn’t it. It worked in the discovery but then the check crashes :confused:

apache is solved I think - there was another problem
now the mrpe-problem

I have the same problem with a MRPE Check.

Crash report: 5587bac0-17ce-11ed-a313-a0369ff0d398

ValueError ('notice' must be non-empty str or None, got )

  File "/omd/sites/zdv/lib/python3/cmk/base/agent_based/checking/__init__.py", line 470, in get_aggregated_result
    result = _aggregate_results(
  File "/omd/sites/zdv/lib/python3/cmk/base/agent_based/checking/__init__.py", line 578, in _aggregate_results
    perfdata, results = _consume_and_dispatch_result_types(subresults)
  File "/omd/sites/zdv/lib/python3/cmk/base/agent_based/checking/__init__.py", line 622, in _consume_and_dispatch_result_types
    for subr in subresults:
  File "/omd/sites/zdv/lib/python3/cmk/base/agent_based/checking/_cluster_modes.py", line 145, in _cluster_check
    yield from summarizer.secondary_results(
  File "/omd/sites/zdv/lib/python3/cmk/base/agent_based/checking/_cluster_modes.py", line 225, in secondary_results
    yield from (
  File "/omd/sites/zdv/lib/python3/cmk/base/agent_based/checking/_cluster_modes.py", line 226, in <genexpr>
    Result(
  File "/omd/sites/zdv/lib/python3/cmk/base/api/agent_based/checking_classes.py", line 367, in __new__
    state, summary, details = _create_result_fields(**kwargs)
  File "/omd/sites/zdv/lib/python3/cmk/base/api/agent_based/checking_classes.py", line 405, in _create_result_fields
    raise ValueError(f"'{name}' must be non-empty str or None, got {var}")

{'details': '[nfsrefer-02]: Check command used in metric system: '
            'check_nfsreferrals_lock',
 'name': 'notice',
 'notice': '',
 'state': <State.OK: 0>,
 'summary': None,
 'var': ''}

I also have this problem, a clustered MRPE check on two cluster nodes, aggregated with “best” mode. CRE 2.1.0p9, Ubuntu 20.04.

Crash report looks like the one of schlarbm.

Thanks for bringing this up guys!
We already had an internal ticket on this, it will be fixed with Werk 14149 in 2.1.0p10. :adhesive_bandage:

2 Likes

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.