Proxmox VE special agent fails with JSONDecodeError after host system ran out of disk space

CMK version: check-mk-raw:2.4.0p10 (Docker)
OS version: Debian 12.11

Error message: [agent] Success, [special_proxmox_ve] json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)CRIT, [piggyback] Success (but no data found for this host), execution time 2.6 sec

Output of “cmk --debug -vvn hostname”:

OMD[home]:~$ cmk --debug -vvn --no-cache proxmox.local
value store: loading from disk
Checkmk version 2.4.0p10
+ FETCHING DATA
  Source: SourceInfo(hostname='proxmox.local', ipaddress='192.168.137.10', ident='agent', fetcher_type=<FetcherType.TCP: 8>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7f58874a1e20]
Read from cache: AgentFileCache(path_template=/omd/sites/home/tmp/check_mk/cache/proxmox.local, max_age=MaxAge(checking=0, discovery=90.0, inventory=90.0), simulation=False, use_only_cache=False, file_cache_mode=1)
Connecting via TCP to 192.168.137.10:6556 (5.0s timeout)
Detected transport protocol: TransportProtocol.PLAIN
Reading data from agent
Closing TCP connection to 192.168.137.10:6556
[cpu_tracking] Stop [7f58874a1e20 - Snapshot(process=posix.times_result(user=0.010000000000000009, system=0.0, children_user=0.0, children_system=0.0, elapsed=1.8299999982118607))]
  Source: SourceInfo(hostname='proxmox.local', ipaddress='192.168.137.10', ident='special_proxmox_ve', fetcher_type=<FetcherType.SPECIAL_AGENT: 6>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7f5887087620]
Read from cache: AgentFileCache(path_template=/omd/sites/home/tmp/check_mk/data_source_cache/special_proxmox_ve/proxmox.local, max_age=MaxAge(checking=0, discovery=90.0, inventory=90.0), simulation=False, use_only_cache=False, file_cache_mode=1)
Calling: /omd/sites/home/share/check_mk/agents/special/agent_proxmox_ve --pwstore=4@0@/omd/sites/home/var/check_mk/passwords_merged@uuid268d4ab6-dd0b-4de2-8299-19b3082d9dd8 -u checkmk@pve -p '********************************' --no-cert-check --timeout 50 proxmox.local
Get data from program
[cpu_tracking] Stop [7f5887087620 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.61, children_system=0.08, elapsed=0.830000001937151))]
  Source: SourceInfo(hostname='proxmox.local', ipaddress='192.168.137.10', ident='piggyback', fetcher_type=<FetcherType.PIGGYBACK: 4>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7f58874a2bd0]
Read from cache: NoCache(path_template=/dev/null, max_age=MaxAge(checking=0.0, discovery=0.0, inventory=0.0), simulation=False, use_only_cache=False, file_cache_mode=1)
0 piggyback files for 'proxmox.local'.
0 piggyback files for '192.168.137.10'.
Get piggybacked data
[cpu_tracking] Stop [7f58874a2bd0 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
[cpu_tracking] Start [7f5887087f50]
+ PARSE FETCHER RESULTS
<<<check_mk>>> / Transition NOOPParser -> HostSectionParser
<<<cmk_agent_ctl_status:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<checkmk_agent_plugins_lnx:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<labels:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<df_v2>>> / Transition HostSectionParser -> HostSectionParser
<<<df_v2>>> / Transition HostSectionParser -> HostSectionParser
<<<systemd_units>>> / Transition HostSectionParser -> HostSectionParser
<<<zfsget:sep(9)>>> / Transition HostSectionParser -> HostSectionParser
<<<zfsget>>> / Transition HostSectionParser -> HostSectionParser
<<<nfsmounts_v2:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<cifsmounts>>> / Transition HostSectionParser -> HostSectionParser
<<<mounts>>> / Transition HostSectionParser -> HostSectionParser
<<<ps_lnx>>> / Transition HostSectionParser -> HostSectionParser
<<<mem>>> / Transition HostSectionParser -> HostSectionParser
<<<cpu>>> / Transition HostSectionParser -> HostSectionParser
<<<uptime>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_if>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_if:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_bonding:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<tcp_conn_stats>>> / Transition HostSectionParser -> HostSectionParser
<<<diskstat>>> / Transition HostSectionParser -> HostSectionParser
<<<kernel>>> / Transition HostSectionParser -> HostSectionParser
<<<md>>> / Transition HostSectionParser -> HostSectionParser
<<<vbox_guest>>> / Transition HostSectionParser -> HostSectionParser
<<<postfix_mailq>>> / Transition HostSectionParser -> HostSectionParser
<<<postfix_mailq_status:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<zpool_status>>> / Transition HostSectionParser -> HostSectionParser
<<<zpool>>> / Transition HostSectionParser -> HostSectionParser
<<<job>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_thermal:sep(124)>>> / Transition HostSectionParser -> HostSectionParser
<<<pvecm_status:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<pvecm_nodes>>> / Transition HostSectionParser -> HostSectionParser
<<<chrony:cached(1756055552,120)>>> / Transition HostSectionParser -> HostSectionParser
<<<local:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<lvm_vgs>>> / Transition HostSectionParser -> HostSectionParser
<<<lvm_lvs:sep(124)>>> / Transition HostSectionParser -> HostSectionParser
<<<lmsensors>>> / Transition HostSectionParser -> HostSectionParser
<<<hddstatus>>> / Transition HostSectionParser -> HostSectionParser
<<<ipmi>>> / Transition HostSectionParser -> HostSectionParser
<<<smart>>> / Transition HostSectionParser -> HostSectionParser
  HostKey(hostname='proxmox.local', source_type=<SourceType.HOST: 1>)  -> Add sections: ['check_mk', 'checkmk_agent_plugins_lnx', 'chrony', 'cifsmounts', 'cmk_agent_ctl_status', 'cpu', 'df_v2', 'diskstat', 'hddstatus', 'ipmi', 'job', 'kernel', 'labels', 'lmsensors', 'lnx_bonding', 'lnx_if', 'lnx_thermal', 'local', 'lvm_lvs', 'lvm_vgs', 'md', 'mem', 'mounts', 'nfsmounts_v2', 'postfix_mailq', 'postfix_mailq_status', 'ps_lnx', 'pvecm_nodes', 'pvecm_status', 'smart', 'systemd_units', 'tcp_conn_stats', 'uptime', 'vbox_guest', 'zfsget', 'zpool', 'zpool_status']
  HostKey(hostname='proxmox.local', source_type=<SourceType.HOST: 1>)  -> Add sections: []
Received no piggyback data
CPU load             15 min load: 1.14, 15 min load per core: 0.09 (12 cores)
CPU utilization      Total CPU: 18.64%
...
Temperature SMART /dev/nvme0n1 38 °C
Temperature Zone 0   Temperature: 16.8 °C
Temperature Zone 1   Temperature: 16.8 °C
Uptime               Up since 2025-08-19 22:15:03, Uptime: 4 days 20 hours
0 piggyback files for 'proxmox.local'.
[cpu_tracking] Stop [7f5887087f50 - Snapshot(process=posix.times_result(user=0.030000000000000027, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.03999999910593033))]
[agent] Success, [special_proxmox_ve] json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)(!!), [piggyback] Success (but no data found for this host), execution time 2.7 sec | execution_time=2.700 user_time=0.040 system_time=0.000 children_user_time=0.610 children_system_time=0.080 cmk_time_agent=1.820 cmk_time_ds=0.140
Agent exited with code 1: Agent failed - please submit a crash report! (Crash-ID: a0779d74-810d-11f0-b773-4a232e7b2aa9)

Traceback (most recent call last):
  File "/omd/sites/home/lib/python3/cmk/special_agents/v0_unstable/agent_common.py", line 151, in _special_agent_main_core
    return main_fn(args)
           ^^^^^^^^^^^^^
  File "/omd/sites/home/lib/python3/cmk/special_agents/agent_proxmox_ve.py", line 480, in agent_proxmox_ve_main
    logged_backup_data = fetch_backup_data(args, session, data["nodes"])
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/home/lib/python3/cmk/special_agents/agent_proxmox_ve.py", line 401, in fetch_backup_data
    with JsonCachedData(
         ^^^^^^^^^^^^^^^
  File "/omd/sites/home/lib/python3.12/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/omd/sites/home/lib/python3/cmk/special_agents/v0_unstable/misc.py", line 369, in JsonCachedData
    cache = json.loads(storage.read(key=storage_key, default="{}"))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/home/lib/python3.12/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/home/lib/python3.12/json/decoder.py", line 338, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/home/lib/python3.12/json/decoder.py", line 356, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)(!!)
OMD[home]:~$

Hi everyone,

I’m having a problem with the agent_proxmox_ve and hope you can help me out. For your information: I am using Checkmk Raw 2.4.0p10, which is running in a Docker container.

The Problem:

Since about 3 AM last night, the check for my Proxmox environment has been failing. The check returns CRIT with the following error message:

[agent] Success, [special_proxmox_ve] json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)CRIT, [piggyback] Success (but no data found for this host), execution time 2.6 sec

The corresponding traceback is:

PythonTraceback (most recent call last):
File "/omd/sites/home/lib/python3/cmk/special_agents/v0_unstable/agent_common.py", line 151, in _special_agent_main_core
  return main_fn(args)
File "/omd/sites/home/lib/python3/cmk/special_agents/agent_proxmox_ve.py", line 480, in agent_proxmox_ve_main
  logged_backup_data = fetch_backup_data(args, session, data["nodes"])
File "/omd/sites/home/lib/python3/cmk/special_agents/agent_proxmox_ve.py", line 401, in fetch_backup_data
  with JsonCachedData(
File "/omd/sites/home/lib/python3.12/contextlib.py", line 137, in __enter__
  return next(self.gen)
File "/omd/sites/home/lib/python3/cmk/special_agents/v0_unstable/misc.py", line 369, in JsonCachedData
  cache = json.loads(storage.read(key=storage_key, default="{}"))
File "/omd/sites/home/lib/python3.12/json/__init__.py", line 346, in loads
  return _default_decoder.decode(s)
File "/omd/sites/home/lib/python3.12/json/decoder.py", line 338, in decode
  obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/omd/sites/home/lib/python3.12/json/decoder.py", line 356, in raw_decode
  raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Important Context:

Exactly when the error first occurred, the host system running the Checkmk container ran out of disk space due to log files. I have since fixed the storage issue, but the error in Checkmk persists.

My suspicion is that due to the full disk, a cache file or another temporary file used by the agent was not written completely or was corrupted, and this is now causing the JSONDecodeError.

Troubleshooting Steps Taken So Far:

  1. Site Restart: An omd restart <sitename> was performed and did not solve the problem.
  2. Clearing Cache: I deleted the suspected cache files under tmp/check_mk/cache/ and tmp/check_mk/json_cache/ for the Proxmox host. This also did not help.
  3. API Test: Manual queries to the Proxmox API via curl (both for authentication and for fetching tasks) work perfectly and return valid JSON. The API itself seems to be fine.
  4. Service Rediscovery (Tabula Rasa): A full service rediscovery for the Proxmox host (cmk -II --tabula-rasa <hostname>) was also performed, without success.
  5. Manual Agent Execution: When I execute the agent directly on the command line (e.g., with /opt/omd/versions/default/lib/python3/cmk/special_agents/agent_proxmox_ve.py ...), the exact same JSONDecodeError with the identical traceback appears. This proves that the problem lies directly with the script’s execution.

The error seems to stem from a persistent, faulty file outside of the usual cache directories that I haven’t been able to locate.

Does anyone have an idea where the agent_proxmox_ve might be storing other data, or what else I can do to reset this state?

Thanks in advance!

Hi everyone,

Here is a final update with the definitive, verified solution for the issue with the Proxmox Special Agent.

The root cause was a specific bug in the Checkmk (2.4.0p10) agent caching framework.

The exact chain of events:

  1. Missing Cache File: For some reason (e.g., after a restart or on the initial run), the agent could not find the expected cache file at ~/tmp/check_mk/json_cache/.
  2. Faulty Read Attempt: The agent’s caching logic first tries to read a potential cache file before querying the API.
  3. The Bug: Since no file was found, this read attempt fails. Due to a bug, the responsible function in misc.py incorrectly returns an empty string ("") instead of a valid JSON default ("{}").
  4. The Crash: The attempt to parse this empty string (json.loads("")) inevitably leads to the known JSONDecodeError. The agent’s write routine, which would have correctly created the directory and file at the end, is never reached.

The Solution: Fixing the Bug with a Patch

The only required action is to patch the faulty function in the Checkmk library. Manually creating directories is not necessary.

File: /omd/versions/default/lib/python3/cmk/special_agents/v0_unstable/misc.py

Procedure: Find the original line (around line 369):

cache = json.loads(storage.read(key=storage_key, default="{}"))

And replace this single line with the following three lines, which handle the faulty empty string:

# --- BEGIN PATCH ---
raw_from_storage = storage.read(key=storage_key, default="{}")
if not raw_from_storage:
    raw_from_storage = "{}"
cache = json.loads(raw_from_storage)
# --- END PATCH ---

This patch prevents the read attempt from crashing. This allows the agent to complete its work and, at the end, call the correct write routine, which then creates the cache directory and files as intended.

Everything is running stable after applying the patch. It would be great if a developer could look into this and have the problem fixed in p11.

1 Like

This bug is still unpatched in 2.4.0p13. Thanks for the patch (fixed the problem here).