CMK version: 2.4.0p2
OS version: CheckMK CEE Appliance
Error message:
May 28 17:26:13 promedev01 journal[2050538]: [cmk-update-agent] WARNING: Agent not updated yet, but found no agent package file. Discarding pending agent hash.
WARN 2025-05-28 17:29:00
May 28 17:29:19 promedev01 journal[2051314]: [cmk-update-agent] ERROR: Content of the state file is corrupted or inaccessible. Falling back to backup state file /opt/checkmk/agent/default/runtime/cmk-update-agent.state.bak. Some data may be lost, though.New state data will be saved to /opt/checkmk/agent/default/runtime/cmk-update-agent.state
May 28 17:29:20 promedev01 journal[2051314]: [cmk-update-agent] ERROR: Content of the state file is corrupted or inaccessible. Falling back to backup state file /opt/checkmk/agent/default/runtime/cmk-update-agent.state.bak. Some data may be lost, though.New state data will be saved to /opt/checkmk/agent/default/runtime/cmk-update-agent.state
After a few days, the updater stops emitting any errors, but also stops attempting to communicate/update entirely. For example, on 2025-06-03 the log only shows:
2025-06-03 11:48:00,588 [2563505] DEBUG: Starting Checkmk Agent Updater v2.4.0p2
2025-06-03 11:48:00,589 [2563505] DEBUG: Successfully read /opt/checkmk/agent/default/package/agent/agent_info.json.
2025-06-03 11:48:00,589 [2563505] DEBUG: Successfully read /opt/checkmk/agent/default/runtime/cmk-update-agent.state.
2025-06-03 11:48:00,589 [2563505] DEBUG: Successfully read /opt/checkmk/agent/default/package/config/cmk-update-agent.cfg.
2025-06-03 11:48:00,589 [2563505] DEBUG: Updating the certificate store "/opt/checkmk/agent/default/runtime/cas/all_certs.pem"...
2025-06-03 11:48:00,593 [2563505] INFO: Updated the certificate store "/opt/checkmk/agent/default/runtime/cas/all_certs.pem" with 3 certificate(s)
2025-06-03 11:48:00,594 [2563505] DEBUG: Running agent updater in InstallMode... Found no pending agent hash for installation. Nothing to do for us.
2025-06-03 11:48:00,594 [2563505] DEBUG: Done.
But notice that there is no “agent package file” being fetched or applied after 2025-06-01 (i.e. no further communication with the deployment server).
Steps to reproduce / Installation procedure:
- Install the Checkmk agent RPM under root:
rpm -Uvh check-mk-agent-2.4.0-48baa5de9b4f9d35.noarch.rpm
- Register and enable automatic updates as the non-privileged user
cmk-agent:
sudo -u cmk-agent cmk-update-agent register \
-x -s checkmkserver.server.cetin -i dev \
-H $(hostname -s) -p https \
-U automation-agent-registration -S SECRET -vv
Output during registration (abbreviated):
Successfully read /opt/checkmk/agent/default/package/agent/agent_info.json.
Successfully read /opt/checkmk/agent/default/runtime/cmk-update-agent.state.
Successfully read /opt/checkmk/agent/default/package/config/cmk-update-agent.cfg.
…
Response from Agent Bakery:
{'result_code': 0, 'result': {'host_secret': '***', 'update_url': '', 'monitored': True}, 'severity': 'success'}
Applying new update URL from deployment server
Successfully scheduled an automatic update with next Checkmk Agent execution.
Saved your registration settings to /opt/checkmk/agent/default/runtime/cmk-update-agent.state.
Done.
- Confirm that
/opt/checkmk/agent/default/runtime/is owned bycmk-agent:cmk-agentand that permissions are set so that onlycmk-agentcan read/write the state files. For reference, here is the output ofls -lh /opt/checkmk/agent/default/runtimeas of June 3:
total 23M
drwxr-xr-x. 2 cmk-agent cmk-agent 64 Jun 3 11:43 cache
drwxr-xr-x. 2 cmk-agent cmk-agent 27 Jun 3 11:43 cas
-rw-r--r--. 1 root root 2.4M Jun 3 11:43 cmk-update-agent.log
-rw-r--r--. 1 cmk-agent cmk-agent 9.9M Jun 1 13:57 cmk-update-agent.log.1
-rw-r--r--. 1 cmk-agent cmk-agent 9.9M May 25 02:09 cmk-update-agent.log.2
-rw-------. 1 cmk-agent cmk-agent 297 Jun 1 13:57 cmk-update-agent.state
-rw-------. 1 cmk-agent cmk-agent 297 Jun 1 13:57 cmk-update-agent.state.bak
drwxr-x---. 2 cmk-agent cmk-agent 66 May 29 13:23 controller
drwxr-x---. 2 cmk-agent cmk-agent 6 May 29 13:23 job
drwxr-x---. 2 cmk-agent cmk-agent 6 May 29 13:23 log
drwxr-xr-x. 2 cmk-agent cmk-agent 152 May 29 13:56 persisted
drwxr-xr-x. 2 cmk-agent cmk-agent 86 May 21 21:54 rtc_remotes
drwxr-x---. 2 cmk-agent cmk-agent 6 May 29 13:23 spool
- Wait for the next automatic run of
cmk-update-agent(the agent cron/daemon runs every 5 minutes by default). - Observe that on the first automatic run (around 2025-05-28), the updater logs the “state file is corrupted or inaccessible” errors shown above.
- After a day or two (around 2025-06-01), those errors disappear—but the updater also stops fetching anything new from the server. It simply logs “Found no pending agent hash for installation. Nothing to do for us,” even though the deployment server has newer packages.
What I expected to happen:
- As long as the host remains registered in the bakery and
cmk-update-agentis run as usercmk-agent, the agent updater should succeed in downloading/applying new agent packages automatically (or at least retry without “state file corrupted” errors). - No “state file is corrupted or inaccessible” messages should appear, because
/opt/checkmk/agent/default/runtime/cmk-update-agent.stateis owned bycmk-agentand mode 600. - Even if there is “no pending agent hash” at a given moment, the next time a new bakery package is published it should fetch it automatically.
What actually happened:
- Immediately after registration (running under
cmk-agent), on the first few runs ofcmk-update-agentwe see:
ERROR: Content of the state file is corrupted or inaccessible. Falling back to backup state file …
despite the state file having correct ownership and mode (-rw------- cmk-agent:cmk-agent).
- After a couple of days, the updater stops logging errors but also stops attempting any updates. Even though the central bakery has published a newer agent package, the updater reports “Found no pending agent hash for installation. Nothing to do for us.” In other words, it never re-contacts the server looking for new versions.
- This is consistent across multiple servers when the agent is installed and registered under a non-privileged account. We did not register as
root(because registering as root prevents auto-updates entirely), so there is no mixed-permission setup. Everything under/opt/checkmk/agent/default/runtime/is owned bycmk-agent. - We do not see any SELinux denials in
/var/log/audit/audit.log. - If we temporarily switch to registering/updating as
root, the updater fails entirely (refuses to run), so the only way we can keep automatic updates is under a non-privileged user—yet that user’s updater run produces “state file corrupted” errors. - Once the initial “corrupted state” errors clear (takes few days in our case), the updater never again attempts to contact the bakery (no “Fetching content” lines after 2025-06-01).
- I have set interval for checking for updates to 3 minutes (it’s for test/devel purposes)
