CheckMK agent 2.2 on Windows 10 computer failing to start

Check MK Agent 2.2.0p3
Windows 10 22H2

OS version:
image

Error message:

This has been happening since we upgraded CheckMK from 1.6 to 2.2.0. On this one computer, the check_mk_agent.exe servie runs for a second or two, then shuts down, then restarts. Loop forever.

All the firewalls are off:

image

There are no other difficulties connecting into the computer with the failure.

Have completely uninstalled the CheckMK agent, manually deleted the CheckMK from ProgramData folder, then did a search for both “CheckMK” and “Check_MK” in Regedit, and deleted all these entries before re-installing. The agent immediately returns to the previous behavior.

Here are related entries in the application log in Event Viewer:

So, what can cause the Windows CheckMK Agent to not play nice?

Do you have any usable information inside the agent log?
→ C:\ProgramData\checkmk\agent\log\

Any endpointsecurity etc. active?
Ralf

Great question.

Looking in the log folder, there is the current log (check_mk.log) plus five other older logs. The #5 file goes back to only 10:45pm last night, so obviously CheckMK is making a lot of log entries very fast.

image

Here is a section of the most recent log file:

2023-09-28 08:39:34.664 [srv 15524] try to kill
2023-09-28 08:39:38.515 [srv 28332] [Trace] Enabled Base
2023-09-28 08:39:38.537 [srv 28332] [Trace] Setting root. service: 'CheckMkService', preset: ''
2023-09-28 08:39:38.542 [srv 28332] [Trace] Try service: 'CheckMkService'
2023-09-28 08:39:38.547 [srv 28332] [Trace] Try registry 'CheckMkService'
2023-09-28 08:39:38.552 [srv 28332] [Trace] Service is found 'C:\Program Files (x86)\checkmk\service\check_mk_agent.exe'
2023-09-28 08:39:38.557 [srv 28332] Set root 'C:\Program Files (x86)\checkmk\service' from registry 'CheckMkService'
2023-09-28 08:39:38.562 [srv 28332] [Trace] Try registry 'CheckMkService'
2023-09-28 08:39:38.567 [srv 28332] Protect file from User write 'C:\Program Files (x86)\checkmk\service\check_mk_agent.exe'
2023-09-28 08:39:38.572 [srv 28332] Protect path from User access 'C:\Program Files (x86)\checkmk\service'
2023-09-28 08:39:38.604 [srv 28332] [Trace] Using root = 'C:\Program Files (x86)\checkmk\service' and data = 'C:\ProgramData\checkmk\agent' folders 
2023-09-28 08:39:38.615 [srv 28332] COM Initialized
2023-09-28 08:39:38.620 [srv 28332] COM initialized
2023-09-28 08:39:38.625 [srv 28332] Found root config on path C:\Program Files (x86)\checkmk\service\check_mk.yml
2023-09-28 08:39:38.643 [srv 28332] [Trace] Enabled Debug
2023-09-28 08:39:38.648 [srv 28332] Loaded start config 'C:\Program Files (x86)\checkmk\service\check_mk.yml','C:\ProgramData\checkmk\agent\bakery','C:\ProgramData\checkmk\agent\check_mk.user.yml'
2023-09-28 08:39:38.653 [srv 28332] service to run
2023-09-28 08:39:38.660 [srv 28332] Service Main
2023-09-28 08:39:38.665 [srv 28332] Service handlers registered
2023-09-28 08:39:38.670 [srv 28332] [Trace] Installing cap file 'C:\Program Files (x86)\checkmk\service\install\plugins.cap'
2023-09-28 08:39:38.675 [srv 28332] Timestamp OK, checking file content...
2023-09-28 08:39:38.680 [srv 28332] [Trace] Installing of CAP file is not required
2023-09-28 08:39:38.685 [srv 28332] [Trace] Installing yml file 'C:\Program Files (x86)\checkmk\service\install\check_mk.install.yml'
2023-09-28 08:39:38.690 [srv 28332] Target File 'C:\ProgramData\checkmk\agent\install\check_mk.install.yml' is absent, reinstall is mandatory
2023-09-28 08:39:38.695 [srv 28332] Reinstalling 'C:\ProgramData\checkmk\agent\install\check_mk.install.yml' with 'C:\Program Files (x86)\checkmk\service\install\check_mk.install.yml'
2023-09-28 08:39:38.700 [srv 28332] This Option/YML installation form MSI is ENABLED
2023-09-28 08:39:38.705 [srv 28332] Remove 'C:\ProgramData\checkmk\agent\install\check_mk.install.yml' [OK]
2023-09-28 08:39:38.711 [srv 28332] Supplied yaml 'C:\Program Files (x86)\checkmk\service\install\check_mk.install.yml' will not be installed
2023-09-28 08:39:38.716 [srv 28332] [Trace] Copy file 'C:\Program Files (x86)\checkmk\service\install\checkmk.dat' to 'C:\ProgramData\checkmk\agent\install\checkmk.dat'
2023-09-28 08:39:38.721 [srv 28332] Timestamp OK, checking file content...
2023-09-28 08:39:38.726 [srv 28332] [Trace] Copy is not required, the file is already exists
2023-09-28 08:39:38.731 [srv 28332] Skip installing user yml file
2023-09-28 08:39:38.736 [srv 28332] Timestamp OK, checking file content...
2023-09-28 08:39:38.742 [srv 28332] Starting upgrade(migration) process...
2023-09-28 08:39:38.747 [srv 28332] [Trace] Legacy Agent not found Upgrade is not possible
2023-09-28 08:39:38.753 [srv 28332] [Trace] trying path C:\Program Files (x86)\checkmk\service
2023-09-28 08:39:38.757 [srv 28332] Found root config on path C:\Program Files (x86)\checkmk\service\check_mk.yml
2023-09-28 08:39:38.762 [srv 28332] [Trace] Loading 'C:\Program Files (x86)\checkmk\service\check_mk.yml'
2023-09-28 08:39:38.770 [srv 28332] [Trace] Loading 'C:\ProgramData\checkmk\agent\bakery\check_mk.bakery.yml'
2023-09-28 08:39:38.775 [srv 28332] [Trace] C:\ProgramData\checkmk\agent\bakery\check_mk.bakery.yml is absent, return
2023-09-28 08:39:38.780 [srv 28332] [Trace] Loading 'C:\ProgramData\checkmk\agent\check_mk.user.yml'
2023-09-28 08:39:38.788 [srv 28332] [Trace] Target 'folders' is empty, overriding with source
2023-09-28 08:39:38.794 [srv 28332] Loaded Config Files by Agent [2.2.0p3,64bit,release,Jun 13 2023,11:15:49] @ 'Win10-64 desktop'
    root:   'C:\Program Files (x86)\checkmk\service\check_mk.yml' size=12750 [OK]
    bakery: 'C:\ProgramData\checkmk\agent\bakery\check_mk.bakery.yml' size=0 [OK]
    user:   'C:\ProgramData\checkmk\agent\check_mk.user.yml' size=477 [OK]
2023-09-28 08:39:38.802 [srv 28332] [Trace] Enabled Debug
2023-09-28 08:39:38.806 [srv 28332] Loaded start config 'C:\Program Files (x86)\checkmk\service\check_mk.yml','C:\ProgramData\checkmk\agent\bakery','C:\ProgramData\checkmk\agent\check_mk.user.yml'
2023-09-28 08:39:38.812 [srv 28332] [Trace] Successful start of thread
2023-09-28 08:39:38.816 [srv 28332] The network is available
2023-09-28 08:39:38.822 [srv 28332] starting controller
2023-09-28 08:39:38.827 [srv 28332] try to kill
2023-09-28 08:39:42.765 [srv 26108] [Trace] Enabled Base
2023-09-28 08:39:42.787 [srv 26108] [Trace] Setting root. service: 'CheckMkService', preset: ''
2023-09-28 08:39:42.792 [srv 26108] [Trace] Try service: 'CheckMkService'
2023-09-28 08:39:42.797 [srv 26108] [Trace] Try registry 'CheckMkService'
2023-09-28 08:39:42.802 [srv 26108] [Trace] Service is found 'C:\Program Files (x86)\checkmk\service\check_mk_agent.exe'
2023-09-28 08:39:42.807 [srv 26108] Set root 'C:\Program Files (x86)\checkmk\service' from registry 'CheckMkService'
2023-09-28 08:39:42.811 [srv 26108] [Trace] Try registry 'CheckMkService'
2023-09-28 08:39:42.816 [srv 26108] Protect file from User write 'C:\Program Files (x86)\checkmk\service\check_mk_agent.exe'
2023-09-28 08:39:42.821 [srv 26108] Protect path from User access 'C:\Program Files (x86)\checkmk\service'
2023-09-28 08:39:42.854 [srv 26108] [Trace] Using root = 'C:\Program Files (x86)\checkmk\service' and data = 'C:\ProgramData\checkmk\agent' folders 
2023-09-28 08:39:42.864 [srv 26108] COM Initialized
2023-09-28 08:39:42.869 [srv 26108] COM initialized
2023-09-28 08:39:42.874 [srv 26108] Found root config on path C:\Program Files (x86)\checkmk\service\check_mk.yml
2023-09-28 08:39:42.892 [srv 26108] [Trace] Enabled Debug
2023-09-28 08:39:42.897 [srv 26108] Loaded start config 'C:\Program Files (x86)\checkmk\service\check_mk.yml','C:\ProgramData\checkmk\agent\bakery','C:\ProgramData\checkmk\agent\check_mk.user.yml'
2023-09-28 08:39:42.902 [srv 26108] service to run
2023-09-28 08:39:42.910 [srv 26108] Service Main
2023-09-28 08:39:42.915 [srv 26108] Service handlers registered
2023-09-28 08:39:42.920 [srv 26108] [Trace] Installing cap file 'C:\Program Files (x86)\checkmk\service\install\plugins.cap'
2023-09-28 08:39:42.925 [srv 26108] Timestamp OK, checking file content...
2023-09-28 08:39:42.930 [srv 26108] [Trace] Installing of CAP file is not required
2023-09-28 08:39:42.935 [srv 26108] [Trace] Installing yml file 'C:\Program Files (x86)\checkmk\service\install\check_mk.install.yml'
2023-09-28 08:39:42.940 [srv 26108] Target File 'C:\ProgramData\checkmk\agent\install\check_mk.install.yml' is absent, reinstall is mandatory
2023-09-28 08:39:42.945 [srv 26108] Reinstalling 'C:\ProgramData\checkmk\agent\install\check_mk.install.yml' with 'C:\Program Files (x86)\checkmk\service\install\check_mk.install.yml'
2023-09-28 08:39:42.949 [srv 26108] This Option/YML installation form MSI is ENABLED
2023-09-28 08:39:42.954 [srv 26108] Remove 'C:\ProgramData\checkmk\agent\install\check_mk.install.yml' [OK]
2023-09-28 08:39:42.960 [srv 26108] Supplied yaml 'C:\Program Files (x86)\checkmk\service\install\check_mk.install.yml' will not be installed
2023-09-28 08:39:42.965 [srv 26108] [Trace] Copy file 'C:\Program Files (x86)\checkmk\service\install\checkmk.dat' to 'C:\ProgramData\checkmk\agent\install\checkmk.dat'
2023-09-28 08:39:42.970 [srv 26108] Timestamp OK, checking file content...
2023-09-28 08:39:42.975 [srv 26108] [Trace] Copy is not required, the file is already exists
2023-09-28 08:39:42.980 [srv 26108] Skip installing user yml file
2023-09-28 08:39:42.985 [srv 26108] Timestamp OK, checking file content...
2023-09-28 08:39:42.991 [srv 26108] Starting upgrade(migration) process...
2023-09-28 08:39:42.996 [srv 26108] [Trace] Legacy Agent not found Upgrade is not possible
2023-09-28 08:39:43.002 [srv 26108] [Trace] trying path C:\Program Files (x86)\checkmk\service
2023-09-28 08:39:43.007 [srv 26108] Found root config on path C:\Program Files (x86)\checkmk\service\check_mk.yml
2023-09-28 08:39:43.011 [srv 26108] [Trace] Loading 'C:\Program Files (x86)\checkmk\service\check_mk.yml'
2023-09-28 08:39:43.019 [srv 26108] [Trace] Loading 'C:\ProgramData\checkmk\agent\bakery\check_mk.bakery.yml'
2023-09-28 08:39:43.024 [srv 26108] [Trace] C:\ProgramData\checkmk\agent\bakery\check_mk.bakery.yml is absent, return
2023-09-28 08:39:43.029 [srv 26108] [Trace] Loading 'C:\ProgramData\checkmk\agent\check_mk.user.yml'
2023-09-28 08:39:43.037 [srv 26108] [Trace] Target 'folders' is empty, overriding with source
2023-09-28 08:39:43.043 [srv 26108] Loaded Config Files by Agent [2.2.0p3,64bit,release,Jun 13 2023,11:15:49] @ 'Win10-64 desktop'
    root:   'C:\Program Files (x86)\checkmk\service\check_mk.yml' size=12750 [OK]
    bakery: 'C:\ProgramData\checkmk\agent\bakery\check_mk.bakery.yml' size=0 [OK]
    user:   'C:\ProgramData\checkmk\agent\check_mk.user.yml' size=477 [OK]
2023-09-28 08:39:43.051 [srv 26108] [Trace] Enabled Debug
2023-09-28 08:39:43.056 [srv 26108] Loaded start config 'C:\Program Files (x86)\checkmk\service\check_mk.yml','C:\ProgramData\checkmk\agent\bakery','C:\ProgramData\checkmk\agent\check_mk.user.yml'
2023-09-28 08:39:43.061 [srv 26108] [Trace] Successful start of thread
2023-09-28 08:39:43.066 [srv 26108] The network is available
2023-09-28 08:39:43.071 [srv 26108] starting controller
2023-09-28 08:39:43.076 [srv 26108] try to kill

I’m not seeing anything that jumps out at me as a problem.

Yes, though all of our computers have the same security software running on it, without causing an issue. In fact, the problem computer wasn’t a problem when it was running the 1.6 version of the agent, and the issue only cropped up when the 2.2 version was installed. Don’t think the endpoint security would prevent the service from running.

That is strange output.
After your line with “try to kill” it continues on my system with-

2023-09-28 07:15:17.081 [srv 22032] try to kill
2023-09-28 07:15:17.081 [srv 22032] Processing dir 'C:\ProgramData\checkmk\agent\bin'
2023-09-28 07:15:17.135 [srv 22032] killed 0 processes in 'C:\ProgramData\checkmk\agent\bin'
2023-09-28 07:15:17.187 [srv 22032] Agent controller 'C:\ProgramData\checkmk\agent\bin\cmk-agent-ctl.exe -vv daemon --agent-channel ms/Global\WinAgent_0' started pid [23156]
2023-09-28 07:15:17.233 [ctl:23156] [cmk_agent_ctl][INFO] starting
2023-09-28 07:15:17.264 [ctl:23156] [cmk_agent_ctl][INFO] Loaded config from '"C:\\ProgramData\\checkmk\\agent\\cmk-agent-ctl.toml"', connection registry from '"C:\\ProgramData\\checkmk\\agent\\registered_connections.json"'
2023-09-28 07:15:17.295 [ctl:23156] [cmk_agent_ctl::modes::daemon][INFO] Could not load pre-configured connections from "C:\\ProgramData\\checkmk\\agent\\pre_configured_connections.json": Das System kann die angegebene Datei nicht finden. (os error 2)
2023-09-28 07:15:17.325 [ctl:23156] [cmk_agent_ctl::misc][DEBUG] Sleeping 27s to avoid DDOSing of sites
2023-09-28 07:15:17.356 [ctl:23156] [cmk_agent_ctl::misc][DEBUG] Sleeping 29s to avoid DDOSing of sites
2023-09-28 07:15:17.388 [ctl:23156] [cmk_agent_ctl::modes::pull][INFO] Start listening for incoming pull requests
2023-09-28 07:15:17.420 [ctl:23156] [cmk_agent_ctl::modes::pull][INFO] Listening on [::]:6556 for incoming pull connections (IPv6 & IPv4 if activated)
2023-09-28 07:15:18.198 [srv 22032] Controller has started: firewall to controller
2023-09-28 07:15:18.198 [srv 22032] Firewall mode is set to configure, adding rule...
2023-09-28 07:15:18.199 [srv 22032] Removing all 'Checkmk Agent' app: 'C:\ProgramData\checkmk\agent\bin\cmk-agent-ctl.exe'
2023-09-28 07:15:18.336 [srv 22032] Removed 1 old rules.
2023-09-28 07:15:18.343 [srv 22032] Firewall rule 'Checkmk Agent' had been added successfully for ports [6556]
2023-09-28 07:15:18.343 [srv 22032] Reading module config normal
2023-09-28 07:15:18.343 [srv 22032] Processed [1] module(s)

Something prevents your agent from the normal processing.

Well, yours is probably doing a standard system check, while mine is just trying to start up the service, which fails… then it tries again… and again… ad infinitum.

What does your log file look like when the CheckMK service starts up?

So I compared the log of the CheckMK agent on the problem computer with the log file from a computer that works fine, from the start of the CheckMK service. Both of them were the same through the “try to kill” process. At that point, they diverge. The good computer continues on with steps that look like yours. The problem computer starts back at the “[Trace] Enabled Base” step.

So, just what does the “try to kill” do?
What happens when the “[Trace] Enabled Base” step occurs?

The problem computer does show the CheckMK service running, just for a moment, and then not, and then running again, and then not, and then… just in a loop.