Check_mk-isc_dhcpd 2.2.0 crashes if no leases in agent output

CMK version: 2.2.0p12
OS version: Debian bookworm (Proxmox LXC)

Error message: DHCP Pool - check failed - please submit a crash report! (Crash-ID: 5cbf89dc-7400-11ee-a2cc-363a195dbb1d)

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)

cmk --debug -vvn opn.domain.tld
Checkmk version 2.2.0p12
+ FETCHING DATA
  Source: SourceInfo(hostname='opn.domain.tld, ipaddress='10.1.21.254', ident='agent', fetcher_type=<FetcherType.TCP: 8>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7f1d55625590]
Read from cache: AgentFileCache(opn.domain.tld, path_template=/omd/sites/cmk/tmp/check_mk/cache/{hostname}, max_age=MaxAge(checking=0, discovery=90.0, inventory=90.0), simulation=False, use_only_cache=False, file_cache_mode=6)
Not using cache (Too old. Age is 6 sec, allowed is 0 sec)
[TCPFetcher] Execute data source
Connecting via TCP to 10.1.21.254:6556 (5.0s timeout)
Detected transport protocol: TransportProtocol.PLAIN (b'<<')
Reading data from agent
Closing TCP connection to 10.1.21.254:6556
Write data to cache file /omd/sites/cmk/tmp/check_mk/cache/opn.domain.tld
Trying to acquire lock on /omd/sites/cmk/tmp/check_mk/cache/opn.domain.tld
Got lock on /omd/sites/cmk/tmp/check_mk/cache/opn.domain.tld
Releasing lock on /omd/sites/cmk/tmp/check_mk/cache/opn.domain.tld
Released lock on /omd/sites/cmk/tmp/check_mk/cache/opn.domain.tld
[cpu_tracking] Stop [7f1d55625590 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.6400000005960464))]
  Source: SourceInfo(hostname='opn.domain.tld', ipaddress='10.1.21.254', ident='piggyback', fetcher_type=<FetcherType.PIGGYBACK: 4>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7f1d55625590]
Read from cache: NoCache(opn.domain.tld, path_template=/dev/null, max_age=MaxAge(checking=0.0, discovery=0.0, inventory=0.0), simulation=False, use_only_cache=False, file_cache_mode=1)
[PiggybackFetcher] Execute data source
No piggyback files for 'opn.domain.tld'. Skip processing.
No piggyback files for '10.1.21.254'. Skip processing.
[cpu_tracking] Stop [7f1d55625590 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
+ PARSE FETCHER RESULTS
<<<check_mk>>> / Transition NOOPParser -> HostSectionParser
<<<cpu>>> / Transition HostSectionParser -> HostSectionParser
<<<df>>> / Transition HostSectionParser -> HostSectionParser
<<<isc_dhcpd>>> / Transition HostSectionParser -> HostSectionParser
<<<kernel>>> / Transition HostSectionParser -> HostSectionParser
<<<labels:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<statgrab_mem>>> / Transition HostSectionParser -> HostSectionParser
<<<mounts>>> / Transition HostSectionParser -> HostSectionParser
<<<statgrab_net>>> / Transition HostSectionParser -> HostSectionParser
<<<netctr>>> / Transition HostSectionParser -> HostSectionParser
<<<ntp>>> / Transition HostSectionParser -> HostSectionParser
<<<ps>>> / Transition HostSectionParser -> HostSectionParser
<<<sshd_config>>> / Transition HostSectionParser -> HostSectionParser
<<<tcp_conn_stats>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_thermal:sep(124)>>> / Transition HostSectionParser -> HostSectionParser
<<<uptime>>> / Transition HostSectionParser -> HostSectionParser
<<<zfsget>>> / Transition HostSectionParser -> HostSectionParser
<<<zfs_arc_cache>>> / Transition HostSectionParser -> HostSectionParser
<<<zpool_status>>> / Transition HostSectionParser -> HostSectionParser
<<<local:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
  HostKey(hostname='opn.domain.tld', source_type=<SourceType.HOST: 1>)  -> Add sections: ['check_mk', 'cpu', 'df', 'isc_dhcpd', 'kernel', 'labels', 'lnx_thermal', 'local', 'mounts', 'netctr', 'ntp', 'ps', 'sshd_config', 'statgrab_mem', 'statgrab_net', 'tcp_conn_stats', 'uptime', 'zfs_arc_cache', 'zfsget', 'zpool_status']
  HostKey(hostname='opn.domain.tld', source_type=<SourceType.HOST: 1>)  -> Add sections: []
Received no piggyback data
[cpu_tracking] Start [7f1d55636390]
value store: synchronizing
Trying to acquire lock on /omd/sites/cmk/tmp/check_mk/counters/opn.domain.tld
Got lock on /omd/sites/cmk/tmp/check_mk/counters/opn.domain.tld
value store: loading from disk
Releasing lock on /omd/sites/cmk/tmp/check_mk/counters/opn.domain.tld
Released lock on /omd/sites/cmk/tmp/check_mk/counters/opn.domain.tld
[cpu_tracking] Stop [7f1d55636390 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.010000001639127731))]
Trying to acquire lock on /omd/sites/cmk/var/check_mk/crashes/base/60646652-7400-11ee-85e5-363a195dbb1d/crash.info
Got lock on /omd/sites/cmk/var/check_mk/crashes/base/60646652-7400-11ee-85e5-363a195dbb1d/crash.info
Releasing lock on /omd/sites/cmk/var/check_mk/crashes/base/60646652-7400-11ee-85e5-363a195dbb1d/crash.info
Released lock on /omd/sites/cmk/var/check_mk/crashes/base/60646652-7400-11ee-85e5-363a195dbb1d/crash.info
Traceback (most recent call last):
  File "/omd/sites/cmk/bin/cmk", line 118, in <module>
    exit_status = modes.call("--check", None, opts, args)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/cmk/lib/python3/cmk/base/modes/__init__.py", line 68, in call
    return handler(*handler_args)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/cmk/lib/python3/cmk/base/modes/check_mk.py", line 2003, in mode_check
    with error_handler:
  File "/omd/sites/cmk/lib/python3/cmk/checkers/error_handling.py", line 59, in __exit__
    self._result = _handle_failure(
                   ^^^^^^^^^^^^^^^^
  File "/omd/sites/cmk/lib/python3/cmk/checkers/error_handling.py", line 95, in _handle_failure
    raise exc
  File "/omd/sites/cmk/lib/python3/cmk/base/modes/check_mk.py", line 2006, in mode_check
    check_result = checking.execute_checkmk_checks(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/cmk/lib/python3/cmk/base/agent_based/checking/_checking.py", line 117, in execute_checkmk_checks
    service_results = check_host_services(
                      ^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/cmk/lib/python3/cmk/base/agent_based/checking/_checking.py", line 293, in check_host_services
    submittables = [
                   ^
  File "/omd/sites/cmk/lib/python3/cmk/base/agent_based/checking/_checking.py", line 303, in <listcomp>
    else get_aggregated_result(
         ^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/cmk/lib/python3/cmk/base/agent_based/checking/_checking.py", line 413, in get_aggregated_result
    consume_check_results(
  File "/omd/sites/cmk/lib/python3/cmk/base/api/agent_based/checking_classes.py", line 484, in consume_check_results
    for subr in subresults:
  File "/omd/sites/cmk/lib/python3/cmk/base/api/agent_based/register/check_plugins.py", line 93, in filtered_generator
    for element in generator(*args, **kwargs):
  File "/omd/sites/cmk/lib/python3/cmk/base/api/agent_based/register/check_plugins_legacy.py", line 207, in check_result_generator
    subresults = _normalize_check_function_return_value(sig_function(**kwargs))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/cmk/lib/python3/cmk/base/api/agent_based/register/check_plugins_legacy.py", line 168, in _normalize_check_function_return_value
    return list(subresults)
           ^^^^^^^^^^^^^^^^
  File "/omd/sites/cmk/share/check_mk/checks/isc_dhcpd", line 95, in check_isc_dhcpd
    for check_result in check_dhcp_pools_levels(
  File "/omd/sites/cmk/lib/python3/cmk/base/check_legacy_includes/dhcp_pools.py", line 21, in check_dhcp_pools_levels
    for new_api_object in dhcp_pools.check_dhcp_pools_levels(free, used, pending, size, params):
  File "/omd/sites/cmk/lib/python3/cmk/base/plugins/agent_based/utils/dhcp_pools.py", line 32, in check_dhcp_pools_levels
    if (levels := params.get(f"{category}_leases")) is not None:
                  ^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'get'

Agent Output of isc_dhcp:

<<<isc_dhcpd>>>
[general]
PID: 40391
[pools]
10.5.0.100	10.5.0.200
10.0.9.100	10.0.9.200
[leases]

It could be that the check parameter ruleset has been changed and you still have and old rule.

Try to edit the existing rule(s) in the parameter ruleset for that service check.

It’s a fresh installed checkmk 2.2, the effective rules on the check are completely default.

I checked the same agent with a checkmk 2.1 server, the only rule adjusment ist the service check interval. On that server the check_mk-isc_dhcpd check is working properly.

There’s a python exception in the cmk debug output log, so I think it’s a bug in the checkmk code.

Getting the same error on new created dhcp pools. The existing ones work without problems even if there are 0 leases used.

@martin.hirschvogel can you help?

Hi,

maybe the plugin on the host is an older version?
Could you please check the plugin-version on the host?
Should be /usr/lib/check_mk_agent/plugins/isc_dhcpd.py

You can find the version number in the first lines of the code.

Thanks.

KR,
Max

Fact is, I have a checkmk 2.2.0 fresh installed server and our opnsense agent (GitHub - bashclub/checkmk-opnsense-agent) outputs the data like mentioned in my first post.
This works fine in checkmk 2.0.0 and 2.1.0, but the server side check crashes in version 2.2.0.
I’ve tested another checkmk 2.2.0 server against en opnsense agent, so the number of leases irrelevant, the check fails anyway.
Obviously the expected agent output of the dhcp check has changed from 2.1.0 to 2.2.0 and the reverse compatibility is broken.
Please let me know, what agent output is expected, so I can fix the opnsense agent.

Other users have the same problem:

head /usr/lib/check_mk_agent/plugins/isc_dhcpd.py 
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Copyright (C) 2019 Checkmk GmbH - License: GNU General Public License v2
# This file is part of Checkmk (https://checkmk.com). It is subject to the terms and
# conditions defined in the file COPYING, which is part of this source code package.

__version__ = "2.2.0p14"

# Monitor leases if ISC-DHCPD
import calendar

still crashing…

Can anyone reproduce this on Linux with our official agent and plugin?

I think @hafr already did and I could also reproduce this in a test environment:

fresh installed latest checkmk 2.2.0 version:

fresh installed debian bookworm server with isc-dhcp-server, latest checkmk agent and isc-dhcp.py:

crashing on checkmk server:

crash report:

Imho thats enough to prove it’s a bug on server side, not agent dependent.

1 Like

Thanks for reproducing the problem.
We have opened an internal ticket on this.

4 Likes

is there a solution to the internal ticket?

We are still looking into the issue. We will update here, when there is something to report.

The problem should be solved as part of Werk 16323

1 Like

Still crashing at my site. Submitted new crash report.

Same here with OPNsense agent and my test environment with linux agent.

I’ve updated the server, the linux agent and the agent plugin.

Maybe the server update log helps:

2023-12-04 18:14:34 - Updating site 'zmbrocks' from version 2.2.0p14.cre to 2.2.0p16.cre...

Creating temporary filesystem /omd/sites/zmbrocks/tmp...OK
Executing update-pre-hooks script "01_mkp-disable-outdated"...OK
Executing update-pre-hooks script "02_cmk-update-config"...
-| ATTENTION
-|   Some steps may take a long time depending on your installation.
-|   Please be patient.
-| 
-| Verifying Checkmk configuration...
-|  01/04 Rulesets...
-|  02/04 UI extensions...
-|  03/04 Agent based plugins...
-|  04/04 Deprecated .mk configuration of plugins...
-| Done (success)
-| 
-| Updating Checkmk configuration...
-|  01/18 Validate user IDs...
-|  02/18 Update views...
-|  03/18 Update dashboards...
-|  04/18 User attributes...
-|  05/18 Global settings...
-|  06/18 Update LDAP connection ids...
-|  07/18 Rulesets...
-|  08/18 Autochecks...
-| Transform failed: host='10.200.30.253', plugin='isc_dhcpd', ruleset='win_dhcp_pools', params=(15.0, 5.0), error=AssertionError('Dictionary.transform_value() got a non-dict: (15.0, 5.0)')
-|  09/18 Remove unused host attributes....
-|  10/18 Convert persisted sections...
-|  11/18 Cleanup version specific caches...
-|  12/18 Background jobs...
-|  13/18 Extract remote sites CAs...
-|  14/18 Add a rule_id to each notification rule...
-|  15/18 Change absolute paths in registered hosts symlinks to relative...
-|  16/18 Remove old custom logos...
-|  17/18 Check for incompatible password hashes...
-|  18/18 Update backup config...
-| Done (success)
OK
Updating core configuration...
Generating configuration for core (type nagios)...
Precompiling host checks...OK
Finished update.

Submitted new crash reports from both environments.

Can you try “Remove all and find new” on one of those hosts? That might already fix it.
The failed transform it certainly not nice but might be more of a cosmetic issue.

@robin.gierse After "Remove all and find new” every single dhcp check on my host failed - including the previous working ones. There was one single exception: i had created a rule with different levels for one check, this one was still working. After creating a new rule overwriting the default levels all previous crashed checks are working again. Seems like there is a problem with the defaults.

Doesn’t change anything, on both of my checkmk systems, there are no additional rules configured.