Automatic Updates - non-privileged user bug

CMK version: 2.4.0p2

OS version: CheckMK CEE Appliance


Error message:

May 28 17:26:13 promedev01 journal[2050538]: [cmk-update-agent] WARNING: Agent not updated yet, but found no agent package file. Discarding pending agent hash.
WARN    2025-05-28 17:29:00
May 28 17:29:19 promedev01 journal[2051314]: [cmk-update-agent] ERROR: Content of the state file is corrupted or inaccessible. Falling back to backup state file /opt/checkmk/agent/default/runtime/cmk-update-agent.state.bak. Some data may be lost, though.New state data will be saved to /opt/checkmk/agent/default/runtime/cmk-update-agent.state
May 28 17:29:20 promedev01 journal[2051314]: [cmk-update-agent] ERROR: Content of the state file is corrupted or inaccessible. Falling back to backup state file /opt/checkmk/agent/default/runtime/cmk-update-agent.state.bak. Some data may be lost, though.New state data will be saved to /opt/checkmk/agent/default/runtime/cmk-update-agent.state

After a few days, the updater stops emitting any errors, but also stops attempting to communicate/update entirely. For example, on 2025-06-03 the log only shows:

2025-06-03 11:48:00,588 [2563505] DEBUG: Starting Checkmk Agent Updater v2.4.0p2
2025-06-03 11:48:00,589 [2563505] DEBUG: Successfully read /opt/checkmk/agent/default/package/agent/agent_info.json.
2025-06-03 11:48:00,589 [2563505] DEBUG: Successfully read /opt/checkmk/agent/default/runtime/cmk-update-agent.state.
2025-06-03 11:48:00,589 [2563505] DEBUG: Successfully read /opt/checkmk/agent/default/package/config/cmk-update-agent.cfg.
2025-06-03 11:48:00,589 [2563505] DEBUG: Updating the certificate store "/opt/checkmk/agent/default/runtime/cas/all_certs.pem"...
2025-06-03 11:48:00,593 [2563505] INFO: Updated the certificate store "/opt/checkmk/agent/default/runtime/cas/all_certs.pem" with 3 certificate(s)
2025-06-03 11:48:00,594 [2563505] DEBUG: Running agent updater in InstallMode...  Found no pending agent hash for installation. Nothing to do for us.
2025-06-03 11:48:00,594 [2563505] DEBUG: Done.

But notice that there is no “agent package file” being fetched or applied after 2025-06-01 (i.e. no further communication with the deployment server).


Steps to reproduce / Installation procedure:

  1. Install the Checkmk agent RPM under root:
rpm -Uvh check-mk-agent-2.4.0-48baa5de9b4f9d35.noarch.rpm
  1. Register and enable automatic updates as the non-privileged user cmk-agent:
sudo -u cmk-agent cmk-update-agent register \
    -x -s checkmkserver.server.cetin -i dev \
    -H $(hostname -s) -p https \
    -U automation-agent-registration -S SECRET -vv

Output during registration (abbreviated):

Successfully read /opt/checkmk/agent/default/package/agent/agent_info.json.
Successfully read /opt/checkmk/agent/default/runtime/cmk-update-agent.state.
Successfully read /opt/checkmk/agent/default/package/config/cmk-update-agent.cfg.
…  
Response from Agent Bakery:
{'result_code': 0, 'result': {'host_secret': '***', 'update_url': '', 'monitored': True}, 'severity': 'success'}
Applying new update URL from deployment server
Successfully scheduled an automatic update with next Checkmk Agent execution.
Saved your registration settings to /opt/checkmk/agent/default/runtime/cmk-update-agent.state.
Done.
  1. Confirm that /opt/checkmk/agent/default/runtime/ is owned by cmk-agent:cmk-agent and that permissions are set so that only cmk-agent can read/write the state files. For reference, here is the output of ls -lh /opt/checkmk/agent/default/runtime as of June 3:
total 23M
drwxr-xr-x. 2 cmk-agent cmk-agent   64 Jun  3 11:43 cache
drwxr-xr-x. 2 cmk-agent cmk-agent   27 Jun  3 11:43 cas
-rw-r--r--. 1 root      root      2.4M Jun  3 11:43 cmk-update-agent.log
-rw-r--r--. 1 cmk-agent cmk-agent 9.9M Jun  1 13:57 cmk-update-agent.log.1
-rw-r--r--. 1 cmk-agent cmk-agent 9.9M May 25 02:09 cmk-update-agent.log.2
-rw-------. 1 cmk-agent cmk-agent  297 Jun  1 13:57 cmk-update-agent.state
-rw-------. 1 cmk-agent cmk-agent  297 Jun  1 13:57 cmk-update-agent.state.bak
drwxr-x---. 2 cmk-agent cmk-agent   66 May 29 13:23 controller
drwxr-x---. 2 cmk-agent cmk-agent    6 May 29 13:23 job
drwxr-x---. 2 cmk-agent cmk-agent    6 May 29 13:23 log
drwxr-xr-x. 2 cmk-agent cmk-agent  152 May 29 13:56 persisted
drwxr-xr-x. 2 cmk-agent cmk-agent   86 May 21 21:54 rtc_remotes
drwxr-x---. 2 cmk-agent cmk-agent    6 May 29 13:23 spool
  1. Wait for the next automatic run of cmk-update-agent (the agent cron/daemon runs every 5 minutes by default).
  2. Observe that on the first automatic run (around 2025-05-28), the updater logs the “state file is corrupted or inaccessible” errors shown above.
  3. After a day or two (around 2025-06-01), those errors disappear—but the updater also stops fetching anything new from the server. It simply logs “Found no pending agent hash for installation. Nothing to do for us,” even though the deployment server has newer packages.

What I expected to happen:

  • As long as the host remains registered in the bakery and cmk-update-agent is run as user cmk-agent, the agent updater should succeed in downloading/applying new agent packages automatically (or at least retry without “state file corrupted” errors).
  • No “state file is corrupted or inaccessible” messages should appear, because /opt/checkmk/agent/default/runtime/cmk-update-agent.state is owned by cmk-agent and mode 600.
  • Even if there is “no pending agent hash” at a given moment, the next time a new bakery package is published it should fetch it automatically.

What actually happened:

  • Immediately after registration (running under cmk-agent), on the first few runs of cmk-update-agent we see:
ERROR: Content of the state file is corrupted or inaccessible. Falling back to backup state file …  

despite the state file having correct ownership and mode (-rw------- cmk-agent:cmk-agent).

  • After a couple of days, the updater stops logging errors but also stops attempting any updates. Even though the central bakery has published a newer agent package, the updater reports “Found no pending agent hash for installation. Nothing to do for us.” In other words, it never re-contacts the server looking for new versions.
  • This is consistent across multiple servers when the agent is installed and registered under a non-privileged account. We did not register as root (because registering as root prevents auto-updates entirely), so there is no mixed-permission setup. Everything under /opt/checkmk/agent/default/runtime/ is owned by cmk-agent.
  • We do not see any SELinux denials in /var/log/audit/audit.log.
  • If we temporarily switch to registering/updating as root, the updater fails entirely (refuses to run), so the only way we can keep automatic updates is under a non-privileged user—yet that user’s updater run produces “state file corrupted” errors.
  • Once the initial “corrupted state” errors clear (takes few days in our case), the updater never again attempts to contact the bakery (no “Fetching content” lines after 2025-06-01).
  • I have set interval for checking for updates to 3 minutes (it’s for test/devel purposes)

Good morning @HubbaBubba,

thank you for the extensive report.
I have opened an internal ticket for investigation.

Sunny Greetings
Hartmut

Good Morning @HubbaBubba,

the bug should be fixed with Werk #18271: Linux non-root agent deployment: Fix issues with agent updater as part of 2.4.0p8.

Can you please confirm (and then mark this thread as solved)?

Sunny Greetings and thank you
Hartmut

4 Likes