Agent Registration only worked once

CheckMK Setup

I use OMD - Open Monitoring Distribution Version 2.2.0p16.cre hosted on Docker, i added my hosts into checkmk and now i wanted to configure services and monitoring agents. This is the first time trying version above 2.1, i´m already familiar with other Check_MK versions.

CONTAINER ID   IMAGE                           COMMAND                  CREATED        STATUS                    PORTS                                                                                                                                                                                                                                                        NAMES
ec9af386ec4b   checkmk/check-mk-raw:2.2.0p16   "/docker-entrypoint.…"   2 months ago   Up 21 minutes (healthy)   0.0.0.0:162->162/udp, :::162->162/udp, 0.0.0.0:514->514/tcp, :::514->514/tcp, 0.0.0.0:6563->6563/tcp, :::6563->6563/tcp, 0.0.0.0:514->514/udp, :::514->514/udp, 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp, 6557/tcp, 0.0.0.0:80->5000/tcp, :::80->5000/tcp   monitoring

Error message

I was able to install the agent both on linux and windows server machines but i also wanted to register the agent. It worked once on a system, but on any other system that i tried later, it did not work.

I always get this error message on windows server:

PS C:\Program Files (x86)\checkmk\service> ./cmk-agent-ctl register --hostname PD_PD-RD --server 192.168.190.16 --site pd --user register
Attempting to register at 192.168.190.16, port 8000. Server certificate details:

PEM-encoded certificate:
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----

Issued by:
        Site 'pd' local CA
Issued to:
        pd
Validity:
        From Tue, 12 Dec 2023 09:09:59 +0000
        To   Thu, 12 Dec 3022 09:09:59 +0000

Do you want to establish this connection? [Y/n]
> Y

Please enter password for 'register'
> i entered the correct password here

[2024-02-19 15:31:27.323087 +01:00] ERROR [cmk_agent_ctl] src\main.rs:29: Error registering existing host at https://192.168.190.16:8000/pd

Caused by:
    Request failed with code 500 Internal Server Error: Internal Server Error

User register is member of the agent_registration built-in role. It makes no difference giving the user admin privileges or changing the user to another admin account, error message stays the same.

My servers and the checkmk host are in the same subnet and can reach each other, so there is no firewall blocking inbetween. Error message also pops up on linux systems.

I only got it one time working, on the machine i tried first:

C:\Program Files (x86)\checkmk\service>cmk-agent-ctl.exe status
Version: 2.1.0p30
Agent socket: operational
IP allowlist: any


Connection: 192.168.190.16:8000/pd
        UUID: 0b7d0471-7f0b-4bc6-b1d1-07e041830cff
        Local:
                Connection type: pull-agent
                Certificate issuer: Site 'pd' agent signing CA
                Certificate validity: Mon, 19 Feb 2024 13:08:31 +0000 - Mon, 19 Feb 2029 13:08:31 +0000
        Remote:
                Connection type: pull-agent
                Registration state: operational
                Host name: PD_PD-Prm

Hi, can you check whether the log file “~/var/log/agent-receiver/error.log” contains further information on the “Internal Server Error”?

Hey, the file is empty

OMD[pd]:~$ cat ~/var/log/agent-receiver/error.log
OMD[pd]:~$ ls -l ~/var/log/agent-receiver/error.log
-rw-r----- 1 pd pd 0 Feb 20 00:00 /omd/sites/pd/var/log/agent-receiver/error.log

here is my site´s config

ADMIN_MAIL:
AGENT_RECEIVER: on
AGENT_RECEIVER_PORT: 8000
APACHE_MODE: own
APACHE_TCP_ADDR: 0.0.0.0
APACHE_TCP_PORT: 5000
AUTOSTART: on
CORE: nagios
LIVESTATUS_TCP: on
LIVESTATUS_TCP_ONLY_FROM: 0.0.0.0 ::/0
LIVESTATUS_TCP_PORT: 6563
LIVESTATUS_TCP_TLS: on
MKEVENTD: on
MKEVENTD_SNMPTRAP: off
MKEVENTD_SYSLOG: off
MKEVENTD_SYSLOG_TCP: off
MULTISITE_AUTHORISATION: on
MULTISITE_COOKIE_AUTH: on
PNP4NAGIOS: on
TMPFS: off

i also tried network_mode: host on my docker container, but the behavior does not change

Just to be sure, there is no error message in the older logs (error.log.1, error.log.2.gz etc.)? Depending on when you last tried it.

For me, valid registrations are logged in ~/var/log/agent-receiver/agent-receiver.log and incorrect ones in error.log.
If your error logs are empty, I don’t know where else you can find more information about the “Internal Server Error”.

found the file, i only copied the most recent entry

[2024-02-20 11:37:47 +0100] [75] [ERROR] Exception in ASGI application
Traceback (most recent call last):
  File "/omd/sites/pd/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 407, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/pd/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/pd/lib/python3.11/site-packages/fastapi/applications.py", line 284, in __call__
    await super().__call__(scope, receive, send)
  File "/omd/sites/pd/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/omd/sites/pd/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/omd/sites/pd/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/omd/sites/pd/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/omd/sites/pd/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/omd/sites/pd/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/omd/sites/pd/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/omd/sites/pd/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/omd/sites/pd/lib/python3.11/site-packages/starlette/routing.py", line 443, in handle
    await self.app(scope, receive, send)
  File "/omd/sites/pd/lib/python3.11/site-packages/fastapi/applications.py", line 284, in __call__
    await super().__call__(scope, receive, send)
  File "/omd/sites/pd/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/omd/sites/pd/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/omd/sites/pd/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/omd/sites/pd/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/omd/sites/pd/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/omd/sites/pd/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/omd/sites/pd/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/omd/sites/pd/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/omd/sites/pd/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/omd/sites/pd/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/pd/lib/python3.11/site-packages/fastapi/routing.py", line 241, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/pd/lib/python3.11/site-packages/fastapi/routing.py", line 167, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/pd/lib/python3.11/site-packages/agent_receiver/endpoints.py", line 110, in register_existing
    _sign_agent_csr(
  File "/omd/sites/pd/lib/python3.11/site-packages/agent_receiver/endpoints.py", line 88, in _sign_agent_csr
    internal_credentials(),
    ^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/pd/lib/python3.11/site-packages/agent_receiver/utils.py", line 76, in internal_credentials
    secret = (users_dir() / INTERNAL_REST_API_USER / "automation.secret").read_text().strip()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/pd/lib/python3.11/pathlib.py", line 1058, in read_text
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/pd/lib/python3.11/pathlib.py", line 1044, in open
    return io.open(self, mode, buffering, encoding, errors, newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/omd/sites/pd/var/check_mk/web/automation/automation.secret'

look like it can´t find /omd/sites/pd/var/check_mk/web/automation/automation.secret, what can i do about it? i changed the automation user to password authentication at the beginning, could this be the cause for the problem?

Ok i just tested my theory, and it worked!

i changed user automation back to Automation secret for machine accounts and i changed the role of my own registration user to Agent registration user and tried again and it worked.

PS C:\Program Files (x86)\checkmk\service> .\cmk-agent-ctl.exe status
Version: 2.2.0p16
Agent socket: operational
IP allowlist: any


Connection: 192.168.190.16/pd
        UUID: 28de5292-20c3-44d2-a78e-7dee7152a36d
        Local:
                Connection mode: pull-agent
                Connecting to receiver port: 8000
                Certificate issuer: Site 'pd' agent signing CA
                Certificate validity: Tue, 20 Feb 2024 11:51:25 +0000 - Tue, 20 Feb 2029 11:51:25 +0000
        Remote:
                Connection mode: pull-agent
                Hostname: PD_PD-RD