Agent_proxmox_ve breaks checkmk 2.5.0 when no subscription is present

CMK version: 2.5.0
OS version: Debian 13

Error message: Failed: ‘status’ - please submit a crash report! (Crash-ID: eb682010-4322-11f1-8000-bc2411a8ad4d)(!!)

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)

value store: loading from disk
Checkmk version 2.5.0
+ FETCHING DATA
  Source: SourceInfo(hostname='<HOSTNAME>', ipaddress='<HOSTIP>', ident='agent', fetcher_type=<FetcherType.TCP: 8>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7f58c2b5cec0]
Read from cache: AgentFileCache(base_path=/omd/sites/monitoring, relative_path_template=tmp/check_mk/cache/<HOSTNAME>, max_age=MaxAge(checking=90, discovery=90.0, inventory=90.0), simulation=False, use_only_cache=False, file_cache_mode=6)
Using data from cache file /omd/sites/monitoring/tmp/check_mk/cache/<HOSTNAME>
Got 213398 bytes data from cache
[cpu_tracking] Stop [7f58c2b5cec0 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
  Source: SourceInfo(hostname='<HOSTNAME>', ipaddress='<HOSTIP>', ident='special_proxmox_ve', fetcher_type=<FetcherType.SPECIAL_AGENT: 6>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7f58c307a0d0]
Read from cache: AgentFileCache(base_path=/omd/sites/monitoring, relative_path_template=tmp/check_mk/data_source_cache/special_proxmox_ve/<HOSTNAME>, max_age=MaxAge(checking=90, discovery=90.0, inventory=90.0), simulation=False, use_only_cache=False, file_cache_mode=6)
Not using cache (does not exist)
Calling: /omd/sites/monitoring/lib/python3.13/site-packages/cmk/plugins/proxmox_ve/libexec/agent_proxmox_ve -u checkmk@pve --password-id uuid70157560-f509-4f36-a7b3-bcde8aa1038e:/omd/sites/monitoring/var/check_mk/passwords_merged --no-cert-check <HOSTIP>
Get data from program
[cpu_tracking] Stop [7f58c307a0d0 - Snapshot(process=posix.times_result(user=0.010000000000000009, system=0.0, children_user=0.4, children_system=0.04, elapsed=20.86999999731779))]
  Source: SourceInfo(hostname='<HOSTNAME>', ipaddress='<HOSTIP>', ident='piggyback', fetcher_type=<FetcherType.PIGGYBACK: 4>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7f58c2d287d0]
Read from cache: NoCache(base_path=/dev/null, relative_path_template=, max_age=MaxAge(checking=0.0, discovery=0.0, inventory=0.0), simulation=False, use_only_cache=False, file_cache_mode=1)
[cpu_tracking] Stop [7f58c2d287d0 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
[cpu_tracking] Start [7f58c2c51220]
+ PARSE FETCHER RESULTS
<<<check_mk>>> / Transition NOOPParser -> HostSectionParser
<<<cmk_agent_ctl_status:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<checkmk_agent_plugins_lnx:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<labels:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<df_v2>>> / Transition HostSectionParser -> HostSectionParser
<<<df_v2>>> / Transition HostSectionParser -> HostSectionParser
<<<systemd_units>>> / Transition HostSectionParser -> HostSectionParser
<<<zfsget:sep(9)>>> / Transition HostSectionParser -> HostSectionParser
<<<zfsget>>> / Transition HostSectionParser -> HostSectionParser
<<<nfsmounts_v2:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<cifsmounts>>> / Transition HostSectionParser -> HostSectionParser
<<<mounts>>> / Transition HostSectionParser -> HostSectionParser
<<<ps_lnx>>> / Transition HostSectionParser -> HostSectionParser
<<<mem>>> / Transition HostSectionParser -> HostSectionParser
<<<cpu>>> / Transition HostSectionParser -> HostSectionParser
<<<uptime>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_if>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_if:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_bonding:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<tcp_conn_stats>>> / Transition HostSectionParser -> HostSectionParser
<<<diskstat>>> / Transition HostSectionParser -> HostSectionParser
<<<kernel>>> / Transition HostSectionParser -> HostSectionParser
<<<md>>> / Transition HostSectionParser -> HostSectionParser
<<<vbox_guest>>> / Transition HostSectionParser -> HostSectionParser
<<<corosync_latency>>> / Transition HostSectionParser -> HostSectionParser
<<<postfix_mailq>>> / Transition HostSectionParser -> HostSectionParser
<<<postfix_mailq_status:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<zpool_status>>> / Transition HostSectionParser -> HostSectionParser
<<<zpool>>> / Transition HostSectionParser -> HostSectionParser
<<<job>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_thermal:sep(124)>>> / Transition HostSectionParser -> HostSectionParser
<<<pvecm_status:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<pvecm_nodes>>> / Transition HostSectionParser -> HostSectionParser
<<<chrony:cached(1777402335,120)>>> / Transition HostSectionParser -> HostSectionParser
<<<local:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<cmk_update_agent_status:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
Get piggybacked data
0 piggyback files for '<HOSTNAME>'.
0 piggyback files for '<HOSTIP>'.
  HostKey(hostname='<HOSTNAME>', source_type=<SourceType.HOST: 1>)  -> Add sections: ['check_mk', 'checkmk_agent_plugins_lnx', 'chrony', 'cifsmounts', 'cmk_agent_ctl_status', 'cmk_update_agent_status', 'corosync_latency', 'cpu', 'df_v2', 'diskstat', 'job', 'kernel', 'labels', 'lnx_bonding', 'lnx_if', 'lnx_thermal', 'local', 'md', 'mem', 'mounts', 'nfsmounts_v2', 'postfix_mailq', 'postfix_mailq_status', 'ps_lnx', 'pvecm_nodes', 'pvecm_status', 'systemd_units', 'tcp_conn_stats', 'uptime', 'vbox_guest', 'zfsget', 'zpool', 'zpool_status']
  HostKey(hostname='<HOSTNAME>', source_type=<SourceType.HOST: 1>)  -> Add sections: []
Received no piggyback data
0 piggyback files for '<HOSTNAME>'.
[cpu_tracking] Stop [7f58c2c51220 - Snapshot(process=posix.times_result(user=0.1100000000000001, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.10999999940395355))]
[agent] Success, [special_proxmox_ve] Failed: 'status' - please submit a crash report! (Crash-ID: eb682010-4322-11f1-8000-bc2411a8ad4d)(!!), [piggyback] Success (but no data found for this host), execution time 21.0 sec | execution_time=20.980 user_time=0.120 system_time=0.000 children_user_time=0.400 children_system_time=0.040 cmk_time_agent=0.000 cmk_time_ds=20.420
Agent exited with code 1: /omd/sites/monitoring/lib/python3.13/site-packages/urllib3/connectionpool.py:1097: InsecureRequestWarning: Unverified HTTPS request is being made to host '<HOSTIP>'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings

...

Traceback (most recent call last):
  File "/omd/sites/monitoring/lib/python3.13/site-packages/cmk/server_side_programs/v1_unstable/_crash_reporting.py", line 96, in wrapper
    return func(*args, **kwargs)
  File "/omd/sites/monitoring/lib/python3.13/site-packages/cmk/plugins/proxmox_ve/special_agent/agent_proxmox_ve.py", line 531, in main
    return agent_proxmox_ve_main(parse_arguments(sys.argv[1:]))
  File "/omd/sites/monitoring/lib/python3.13/site-packages/cmk/plugins/proxmox_ve/special_agent/agent_proxmox_ve.py", line 328, in agent_proxmox_ve_main
    for name, content in _create_node_sections(
                         ~~~~~~~~~~~~~~~~~~~~~^
        node,
        ^^^^^
    ...<6 lines>...
        data,
        ^^^^^
    ):
    ^
  File "/omd/sites/monitoring/lib/python3.13/site-packages/cmk/plugins/proxmox_ve/special_agent/agent_proxmox_ve.py", line 378, in _create_node_sections
    status=node["subscription"]["status"],
           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
KeyError: 'status'
Failed: 'status' - please submit a crash report! (Crash-ID: eb682010-4322-11f1-8000-bc2411a8ad4d)(!!)

It appears the issue is that the API data does not contain subscription info (which was fine in 2.4.x).

Typical special agent programming bug - don’t expect the data to be there also if you think the data should be there :rofl:
If this code would be the same as in the old agent you would have no problem.
@martin.hirschvogel some more work to do

1 Like

Works on my Proxmox monitoring :wink:

thanks, will forward!

We are working on it. Should be fixed in 2.5.0p1

As Martin said, he already created an internal ticket for this and we will fix this.

There is just one thing I am wondering about: Which version of Proxmox are you using @pep ?

That would be 9.1.9.

same issue with Proxmox 8.4.18

Hello everyone! Sorry for the inconvenience. The issue will be fixed with Werk #19459: Fix Proxmox VE Node Info handling of missing subscription status

Thanks!

Best,
Luka

2 Likes

Oh well, now it crashes in a different JSON key ( max_cpu). Also the crash reports cannot be opened on my instance anymore (load screen takes forever):

Traceback (most recent call last):
  File "/omd/sites/monitoring/lib/python3.13/site-packages/cmk/server_side_programs/v1_unstable/_crash_reporting.py", line 96, in wrapper
    return func(*args, **kwargs)
  File "/omd/sites/monitoring/lib/python3.13/site-packages/cmk/plugins/proxmox_ve/special_agent/agent_proxmox_ve.py", line 531, in main
    return agent_proxmox_ve_main(parse_arguments(sys.argv[1:]))
  File "/omd/sites/monitoring/lib/python3.13/site-packages/cmk/plugins/proxmox_ve/special_agent/agent_proxmox_ve.py", line 328, in agent_proxmox_ve_main
    for name, content in _create_node_sections(
                         ~~~~~~~~~~~~~~~~~~~~~^
        node,
        ^^^^^
    ...<6 lines>...
        data,
        ^^^^^
    ):
    ^
  File "/omd/sites/monitoring/lib/python3.13/site-packages/cmk/plugins/proxmox_ve/special_agent/agent_proxmox_ve.py", line 391, in _create_node_sections
    node_total_cpu=node["maxcpu"],
                   ~~~~^^^^^^^^^^
KeyError: 'maxcpu'

Failed: 'maxcpu' - please submit a crash report! (Crash-ID: eab4f7f4-496e-11f1-8000-bc2411a8ad4d)(!!)

This error comes from a feature not existing in the old 2.4 special agent.
And again it is prove why a special agent should not play around with returned data too much. It would be way easier if this special agent outputs the raw json data and let the checks then process the results.
The you have in the maximum one crashed check and not a whole special agent not working.

3 Likes

@martin.hirschvogel @sebkir

Hello @pep and @andreas-doehler,

I have looked at the last issue you mentioned. I am attaching an MKP that is addressing it. Can you please test it out and let me know if it works.

Thanks!

proxmox_ve_allocation-0.0.1.mkp (6.7 KB)

The problem is not such a single error but the very bad design of special agents that query JSON APIs. Until this problem is fixed these problems will come up again and again. Such a generic JSON approach would also be a good start to a generic JSON agent.

How can I verify that the MKP’s version of the special agent is used? I’ve added and enabled the extension and for good measure restarted checkmk, but checking the Proxmox host still fails with the same error:

Traceback (most recent call last):
  File "/omd/sites/monitoring/lib/python3.13/site-packages/cmk/server_side_programs/v1_unstable/_crash_reporting.py", line 96, in wrapper
    return func(*args, **kwargs)
  File "/omd/sites/monitoring/lib/python3.13/site-packages/cmk/plugins/proxmox_ve/special_agent/agent_proxmox_ve.py", line 531, in main
    return agent_proxmox_ve_main(parse_arguments(sys.argv[1:]))
  File "/omd/sites/monitoring/lib/python3.13/site-packages/cmk/plugins/proxmox_ve/special_agent/agent_proxmox_ve.py", line 328, in agent_proxmox_ve_main
    for name, content in _create_node_sections(
                         ~~~~~~~~~~~~~~~~~~~~~^
        node,
        ^^^^^
    ...<6 lines>...
        data,
        ^^^^^
    ):
    ^
  File "/omd/sites/monitoring/lib/python3.13/site-packages/cmk/plugins/proxmox_ve/special_agent/agent_proxmox_ve.py", line 391, in _create_node_sections
    node_total_cpu=node["maxcpu"],
                   ~~~~^^^^^^^^^^
KeyError: 'maxcpu'

Failed: 'maxcpu' - please submit a crash report! (Crash-ID: ecaf5786-4c96-11f1-8000-bc2411a8ad4d)(!!)

My own Proxmox system is working also without this mkp.
System information → 3 nodes - version 9.1.9