Agent Updater - certificate verify failed: Hostname mismatch

CMK version:2.3.0p6
OS version:docker

Error message:
Version: 2.3.0p6, OS: windows, Update error: HTTPSConnectionPool(host=‘checkmk_rzrb.xxx.yyy’, port=443): Max retries exceeded with url: /rzrb/check_mk/deploy_agent.py (Caused by SSLError(SSLCertVerificationError(1, “[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for ‘checkmk_rzrb.xxx.yyy’. (_ssl.c:1000)”)))

Hello
I have the official Docker image running on docker behind a traefik ingress proxy.
That ingress proxy does the https endpoint and supply’s a wildcard certificate: *.xxx.yyy
Until the upgrade from 2.2.0p27 this worked without a problem. Now, the updater seems to check if the hostname of the Certificate matches, which i never will (but should), since its a wildcard certificate.

I already tried to supply the certificate via Agent updater rules → Certificates for HTTPS verification. Which does nothing, because the error is not that the certificate would not be trusted, but the hostname mismatch.

Any more Ideas?

Regards
Beat

This may be a unwanted change, introduced by switching from OpenSSL to cryptography package.

Even with the “trust-cert” option, this does not work:

updater register -i rzrb -H $env:computername -U cmkadmin -P **** -v --trust-cert

Updated the certificate store “C:\ProgramData\checkmk\agent\config\cas\all_certs.pem” with 1 certificate(s)
Going to register agent at deployment server
Updated the certificate store “C:\ProgramData\checkmk\agent\config\cas\all_certs.pem” with 2 certificate(s)
HTTPSConnectionPool(host=‘checkmk_rzrb.XXX.YYY’, port=443): Max retries exceeded with url: /rzrb/check_mk/register_agent.py (Caused by SSLError(SSLCertVerificationError(1, “[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for ‘checkmk_rzrb.XXX.YYY’. (_ssl.c:1000)”)))

The best way to figure out whats going on here is looking at your yaml to see what might be happening (or not happening). Can you provide the yaml that you are using? (also possibly provide traefik yaml as this could affect it as well depending)

EDIT: here is another post where I posted my docker swarm yaml for a checkmk site I ran a bit ago: Checkmk docker-compose + traefik

The checkmk.yml part for the site:

services:
  rzrb:
    image: registry.checkmk.com/enterprise/check-mk-enterprise:2.3.0p6
    networks:
      - traefik-public
      - internal
    environment: 
      TZ: "Europe/Zurich"
      CMK_SITE_ID: "rzrb"
      CMK_LIVESTATUS_TCP: "on"
      CMK_PASSWORD: "***"
      MAIL_RELAY_HOST: "smtp.xxx.yyy"
    volumes:
      - rzrb_sites:/omd/sites
      - rzrb_backup:/backup
    deploy:
      placement:
        constraints:
          - node.labels.checkmk_rzrb == true
      labels:
        - traefik.enable=true
        - traefik.docker.network=traefik-public
        - traefik.constraint-label=traefik-public
        - traefik.http.routers.checkmk_rzrb.rule=Host(`checkmk_rzrb.xxx.yyy`)
        - traefik.http.routers.checkmk_rzrb.entrypoints=websecure
        - traefik.http.routers.checkmk_rzrb.tls=true
        - traefik.http.routers.checkmk_rzrb.tls.options=default
        - traefik.http.routers.checkmk_rzrb.service=checkmk_rzrb
        - traefik.http.services.checkmk_rzrb.loadbalancer.server.port=5000
        - traefik.tcp.routers.checkmk_rzrb_agentreceiver.rule=HostSNI(`*`)
        - traefik.tcp.routers.checkmk_rzrb_agentreceiver.entrypoints=agentreceiverrzrb
        - traefik.tcp.routers.checkmk_rzrb_agentreceiver.service=checkmk_rzrb_agentreceiver
        - traefik.tcp.services.checkmk_rzrb_agentreceiver.loadbalancer.server.port=8004

traefik.yml relevant part:

services:
  traefik:
    image: traefik:v2.10
    environment:
      - TZ=Europe/Zurich
    ports:
      # Listen on port 80, default for HTTP, necessary to redirect to HTTPS
      - target: 80
        published: 80
        mode: host
      # Listen on port 443, default for HTTPS
      - target: 443
        published: 443
        mode: host
      # AgentReceiverRzrb
      - target: 8004
        published: 8004
        mode: host

    networks:
      # any other service that needs to be publicly available with HTTPS
      - traefik-public

networks:
  # Use the previously created public network "traefik-public", shared with other
  # services that need to be publicly available via this Traefik
  traefik-public:
    external: true

traefik.toml relevant part:

[entryPoints]
  [entryPoints.web]
    address = ":80"
    [entryPoints.web.http]
    [entryPoints.web.http.redirections]
      [entryPoints.web.http.redirections.entryPoint]
        to = "websecure"
        scheme = "https"

  [entryPoints.websecure]
    address = ":443"

 [entryPoints.agentreceiverrzrb]
    address = ":8004"

dynamic traefik.toml:

[tls.stores]
  [tls.stores.default]
    [tls.stores.default.defaultCertificate]
      certFile = "/certificates/cert.pem"
      keyFile  = "/certificates/cert.key"

[[tls.certificates]]
  certFile = "/certificates/cert.pem"
  keyFile  = "/certificates/cert.key"

This setup has worked for 3 years now, the only thing i changed was the version of the checkmk image and updated the certificate every year. No change required inside checkmk when i did that.

The agents have all auto updated to 2.3.0p6 after the upgrade, so the setup works with the 2.2.0p27 agent. The problem only occures once the 2.3.0p6 agent try to update 4 hours later.

When i open the site checkmk_rzrb.xxx.yyy in the browser, i get the correct wildcard-certificate configured in traefik, there has nothing changed.

Regards
Beat

I just tried it again, just to be shure. Uninstalled the agent, deleted ProgrammData\checkmk and installed an old client 2.2.0p18.

Registering Client:


PS C:\Users\administrator> ECHO Y | & "c:\Program Files (x86)\checkmk\service\cmk-agent-ctl.exe" register --hostname $env:computername --server checkmk_rzrb.xxx.yyy:8004 --site rzrb --user agentregistration --password ****
Attempting to register at checkmk_rzrb.xxx.yyy, port 8004. Server certificate details:

PEM-encoded certificate:
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----

Issued by:
        Site 'rzrb' local CA
Issued to:
        rzrb
Validity:
        From Tue, 14 Jun 2022 07:47:48 +0000
        To   Sun, 15 Oct 3020 07:47:48 +0000

Do you want to establish this connection? [Y/n]
> Registration complete.

Registering updater:

PS C:\Users\administrator> & "c:\Program Files (x86)\checkmk\service\check_mk_agent.exe" updater register -i rzrb -H $env:computername -U cmkadmin -P **** -v
Finalizing installation, please wait.Going to register agent at deployment server
Applying new update URL https://checkmk_rzrb.xxx.yyy/rzrb/check_mk/ from deployment server
Successfully registered agent of host "VWS11" for deployment.
You can now update your agent by running 'check_mk_agent.exe updater -v'
Saved your registration settings to C:\ProgramData\checkmk\agent\config\cmk-update-agent.state.

update of the agent:

PS C:\Users\administrator> & "c:\Program Files (x86)\checkmk\service\check_mk_agent.exe" updater -v
Starting Update mode as plugin.
Getting target agent configuration for host 'VWS11' from deployment server
Target state (from deployment server):
  Agent available:     True
  Signatures:          1
  Target hash:         65cd49541d312c86
Downloaded agent has size 38420992 bytes.
Signature check OK.
Transferred MSI package to the agent's installation dir.Awaiting upcoming automatic update performed by agent.
<<<cmk_update_agent_status:sep(0)>>>
{"last_check": 1718694982.8431132, "last_update": null, "aghash": null, "pending_hash": "65cd49541d312c86", "update_url": "https://checkmk_rzrb.xxx.yyy/rzrb/check_mk", "trusted_certs": {"0": {"corrupt": false, "not_after": "20510902173204Z", "signature_algorithm": "sha1WithRSAEncryption", "common_name": "beat.guggisberg"}}, "error": null}

run the updated agent update:

PS C:\Users\administrator> & "c:\Program Files (x86)\checkmk\service\check_mk_agent.exe" updater -v
Updated the certificate store "C:\ProgramData\checkmk\agent\config\cas\all_certs.pem" with 1 certificate(s)
Starting Update mode as plugin.
Getting target agent configuration for host 'VWS11' from deployment server
Failed to connect to Agent Bakery: HTTPSConnectionPool(host='checkmk_rzrb.xxx.yyy', port=443): Max retries exceeded with url: /rzrb/check_mk/deploy_agent.py (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'checkmk_rzrb.xxx.yyy'. (_ssl.c:1000)")))
Retrying with fallback URL: https://checkmk_rzrb.xxx.yyy/rzrb/check_mk
<<<cmk_update_agent_status:sep(0)>>>
{"last_check": 1718694982.8431132, "last_update": 1718266734.0, "aghash": "65cd49541d312c86", "pending_hash": null, "update_url": "https://checkmk_rzrb.xxx.yyy/rzrb/check_mk", "trusted_certs": {"0": {"corrupt": false, "not_after": "2051-09-02T17:32:04+00:00", "signature_algorithm": "sha1", "common_name": "beat.guggisberg"}}, "error": "HTTPSConnectionPool(host='checkmk_rzrb.xxx.yyy', port=443): Max retries exceeded with url: /rzrb/check_mk/deploy_agent.py (Caused by SSLError(SSLCertVerificationError(1, \"[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'checkmk_rzrb.xxx.yyy'. (_ssl.c:1000)\")))"}
Failed to connect to Agent Bakery: HTTPSConnectionPool(host='checkmk_rzrb.xxx.yyy', port=443): Max retries exceeded with url: /rzrb/check_mk/deploy_agent.py (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'checkmk_rzrb.xxx.yyy'. (_ssl.c:1000)")))
Retrying with fallback URL: https://checkmk_rzrb.xxx.yyy/rzrb/check_mk

What I’m seeing here is you’re reverse proxying the agent port which can cause these issues (as I’ve seen many do this before). In my example yaml I allow port 8000 (8004 in your case) to be exposed without any intervention as the agent controller has its own certs that it handles. I no longer use traefik in my setup but once I exposed port 8000 in my kubernetes setup instead of having nginx reverse proxy it my agent updater worked fine.

I know, i have built that specifically so the different sites are reachable via https without extra port and are correctly certified. When i just use port 8000 (8004) then it comes with its self-signed version.

The Problem here is, that the Agent is no longer able to correctly handle wildcard-certificates. Which was no problem with the old OpenSSL library.

How can i bring this to the attention of the developers without having to upgrade to the expensive Support plan just for one Issue?..

Ok, did some more testing and could narrow it down to the content of the folder “C:\ProgramData\checkmk\agent\modules\python-3”

I was able to get a running 2.3.0p6 agent when I made a backup of that folder from the 2.2.0p18 agent and replaced the content after the upgrade.

  1. installed old agent 2.2.0p18
  2. copy C:\ProgramData\checkmk\agent\modules\python-3 to backup
  3. run updater → installed 2.3.0p6
  4. deleted C:\ProgramData\checkmk\agent\modules\python-3
  5. copy from backup to C:\ProgramData\checkmk\agent\modules\python-3
  6. run updater →
& "c:\Program Files (x86)\checkmk\service\check_mk_agent.exe" updater -v
Starting Update mode as plugin.
Getting target agent configuration for host 'VWS11' from deployment server
Target state (from deployment server):
  Agent available:     True
  Signatures:          1
  Target hash:         ad265080694db52b
Agent ad265080694db52b already installed.
<<<cmk_update_agent_status:sep(0)>>>
{"last_check": 1718759864.4737267, "last_update": 1718756780.0, "aghash": "ad265080694db52b", "pending_hash": null, "update_url": "https://checkmk_rzrb.xxx.yyy/rzrb/check_mk", "trusted_certs": {"0": {"corrupt": false, "not_after": "2051-09-02T17:32:04+00:00", "signature_algorithm": "sha1", "common_name": "beat.guggisberg"}}, "error": null}

So the Problem is not the agent script itself, it is in the upgraded python package distributed with the agent.

I went ahead and attempted to reproduce this issue and I could not on 2.3.0p6. I used a wildcard cert for my HTTPS and everything worked (even when i accidentally upgraded from a beta release to p6). I went ahead and started from scratch as well and got the same results. The only real difference is that I’m not reverse proxying/loadbalancing the agent port at all. I am using the enterprise version as a docker container as well with no modifications to the image.

This seems to be an isolated case from current testing

2 Likes

Thank you for your efforts.

I am still on a miss on what could be the issue.
Since i soon have to install checkmk on a new swarmcluster, i will let this rest and try to do a clean setup on that new cluster. I will do this from scratch since i made some errors years ago when setting up this cluster, like hostnames with underscores in them. Force of habit from long ago when i did not know these are not allowed, but work mostly. And since the setup worked, i didn’t change it.

Regards
Beat

Hello,

are there any news on this?

I run in a similar situation after updating from 2.2.0p34 to 2.3.0p12 (enterprise). All and only Windows agent updaters lost connection with that “Hostname mismatch” error:

HTTPSConnectionPool(host='monitor1.company.admin', port=443): Max retries exceeded with url: /rz240admin/check_mk/register_agent.py (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'monitor1.company.admin'. (_ssl.c:1000)")))

CheckMK is installed at Hyper-V VM , no proxy , selfsigned cert (no wildcard) and running for years since version 1.6.?.

The way of restoring the directory “python-3” from version 2.2.0p34 worked as well.

Thanks for any hints.

Regards,
Grewishka

Hey

Sadly no news so far.
I still have the problem and have not yet gotten around to finish the new Dockerswarm Cluster to recreate the Checkmk services.

Regards
Beat

Hey,

check out the information for the major update 2.3.0 here

SSL compatibility has some changes. For the first time, Checkmk 2.3.0 provides OpenSSL in version 3.0 instead of the previous 1.1. This change affects almost all components that use Secure Socket Layer or Transport Level Security.

The behavior of the command line parameter –trust-cert of the cmk-update-agent command has been changed. Previously, the entire certificate chain was checked and the highest self-signed certificate found in the hierarchy was trusted; this is usually the root or an intermediate certificate. From Checkmk 2.3.0, only the server certificate is imported and trusted.

This might be the issue you experience, as I have the same problem, as I rely on --trust-cert when registering hosts for automatic agent updates and do not provide a [certificate via the Agent Bakery] (Automatic agent updates - Distribute agents and plug-ins automatically). In this case, from Checkmk 2.3.0 hosts already lose the trust position when the server certificate expires, hosts registered with Checkmk 2.2.0 only when the root or intermediate certificates expire.

Hello agstITc,

thanks for your reply. I did notice this section in the documentation, but was thinking that does not meet to my situation cause:

  1. I use a self-signed server certificate (no ca cert). *1
  2. I do provide a certificate via the Agent Bakery.
  3. The linux agent updater accepts the certificate.

Do you think I miss something?

*1 Self-signed certs seems to be ca certs

Im not sure, but I do use a self-signed certificate also and provide a certificate via Agent bakery and experience the exact same issue with windows agent updater…
Therefore it must have sth. to do with the hierarchy of certificates, as I used the -t parameters for it to work on version 2.2.0 and below without any issues.

Maybe the problem with self-signed certificates is, they are ca certs but the updater wants server certs now.

But the error message points to the hostname not the type…

Seems conflicting to me

2.3.0 - 3.3. Update to version 2.3.0 / 4.3. Certificate check for agent updates ⇗

“The behavior of the command line parameter --trust-cert of the cmk-update-agent command has been changed. Previously, the entire certificate chain was checked and the highest self-signed certificate found in the hierarchy was trusted; this is usually the root or an intermediate certificate. From Checkmk 2.3.0, only the server certificate is imported and trusted.”

2.3.0 - 6.1. Automatic agent updates / 7.1. The connection over SSL/TLS does not function ⇗

“In the HTTPS configuration of the agent updater rule a root certificate must be specified with which the connection to the Checkmk server can be verified. In other words: the certificate chain included in the Checkmk server’s server certificate must be verifiable by the certificate given here. Often the server certificate is specified here instead — this is however not suitable for this purpose.”

I updated from 2.2.0p12 to 2.3.0p19 yesterday and am also experiencing the same problem.

The only thing that surprises me is that only the agents on the Windows systems are affected. The agents on the Linux hosts were updated without any problems.
Reinstalling an agent on a Windows system did not resolve the problem.

I’m also using a self-signed certificate.

I have solved this problem in my case today. The certificate needs to contain not only the ‘common name’ but also the ‘subject alternative name’. Hope this helps.

1 Like