CMK-Agent Bakery - Certificate Verify Error

Hello everyone,

We are trying to configure the agent bakery in our agents with HTTPS but we are getting an error that cannot verify the certificate. We placed the certificate in /etc/pki/ca-trust/source/anchors/certificate.pem and testing it works fine with HTTPS but when registering the agent we got the error written below.

We appreciate your help with this error!!

CMK version:
2.0.0p31 (CEE)

OS version:
NAME=“Red Hat Enterprise Linux Server”
VERSION=“7.5 (Maipo)”
ID=“rhel”
ID_LIKE=“fedora”
VARIANT=“Server”
VARIANT_ID=“server”
VERSION_ID=“7.5”

Error message:
$ cmk-update-agent register --server $CMK_SERVER --site $CMK_SITE --host $CMK_HOST --protocol $CMK_PROTOCOL --user $CMK_USR --secret $CMK_PWD -v --trust-cert
Missing config file at /etc/check_mk/cmk-update-agent.cfg. Configuration may be incomplete.
Going to register agent at deployment server
Trying to import certificate from the server’s certificate chain but found no self-signed certificate or CA certificate. Aborting import.
HTTPSConnectionPool(host=‘$HOSTNAME’, port=443): Max retries exceeded with url: /$CMK_SITE/check_mk/register_agent.py (Caused by SSLError(SSLError(“bad handshake: Error([(‘SSL routines’, ‘tls_process_server_certificate’, ‘certificate verify failed’)])”)))
See syslog for details.

There is a difference in documented and thus expected behavior (yes, call it a bug). The agent updater currently only obeys certificates in the bundled certifi bundle or alternative certificates baked in via bakery configuration.

We are working on this. But it seems there is no quick solution in sight, since a lot of environments have to be considered (for example Python update script vs. bundled updater, Linux vs. Windows and so on).

Please use the first point in “Certificates via Agent Bakery” as workaround for now:

We’ll update “Certificate Store” as soon as we can guarantee consistent behavior across all operating systems.

Hello,

Thanks for the information, we are still having some problems with the certificate validation. We chose to use the second option wich is the one were we configure the certificate in the agent update rule but when we bake the agent and update the certificate it fails to verify it:

requests.exceptions.SSLError: HTTPSConnectionPool(host=‘SLAVE01’, port=443): Max retries exceeded with url: /Site1/check_mk/deploy_agent.py (Caused by SSLError(SSLError(“bad handshake: Error([(‘SSL routines’, ‘tls_process_server_certificate’, ‘certificate verify failed’)])”)))
2023-03-13 09:50:10,767 ERROR: HTTPSConnectionPool(host=‘SLAVE01’, port=443): Max retries exceeded with url: /Site1/check_mk/deploy_agent.py (Caused by SSLError(SSLError(“bad handshake: Error([(‘SSL routines’, ‘tls_process_server_certificate’, ‘certificate verify failed’)])”)))

But when we try to test with openssl or curl, we sucsessfully connect to the Slave and the certificate is verified with 0 errors:

root@hostname:/etc/pki/tls/certs$ openssl s_client -connect SLAVE01:443
CONNECTED(00000003)
depth=1 C = PT, O = ESI, OU = SEGURANCA, CN = CERTIFICATE
verify return:1
depth=0 C = PT, ST = Lisboa, L = Lisboa, O = CLIENT, OU = CLIENT, CN = SLAVE01
verify return:1

root@SL000103:/etc/pki/tls/certs$ curl https://SLAVE01/Site1/check_mk/deploy_agent.py --include --noproxy ‘*’
HTTP/1.1 200 OK
Date: Mon, 13 Mar 2023 10:07:10 GMT
Server: Apache

Am I doing something wrong? Thanks if you can help.

Did you test with 2.0.0p34 or 2.1.0p24? These contain the updated Certifi, see werk 15068, Fix improper certificate validation in agent updater If it does not, I’ll take a look and probably get back to you for further details.

Hi,

We are using 2.0.0p31. The problem world be solved if we update the agent to 2.0.0p34?

Thanks

It should be solved when baking p34 agent updater packages. If it’s not in your case, we have to compare certificate chains used.

hello,

We updated the slaves and agents to 2.0.0p34 and we still have the same problem with the certificate validation:

root@HOSTNAME:~$ cmk-update-agent -vv                                                                                                                                                                                                                                          
            Successfully read /etc/cmk-update-agent.state.
            Successfully read /etc/check_mk/cmk-update-agent.cfg.
            Updating the certificate store "/var/lib/check_mk_agent/cas/all_certs.pem"...
            Updated the certificate store "/var/lib/check_mk_agent/cas/all_certs.pem" with 1 certificate(s)
            
            +-------------------------------------------------------------------+
            |                                                                   |
            |  Checkmk Agent Updater v2.0.0p34 - Update                        |
            |                                                                   |
            +-------------------------------------------------------------------+
            Getting target agent configuration for host 'AGENTHOSTNAME' from deployment server
            Fetching content (using requests): https://SLAVE/SITE/check_mk/deploy_agent.py
            Failed to connect to agent bakery: HTTPSConnectionPool(host='SLAVE', port=443): Max retries exceeded with url: /SITE/check_mk/deploy_agent.py (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))
            Retrying with fallback URL: http://SLAVE/SITE/check_mk
            Fetching content (using requests): http://SLAVE/SITE/check_mk/deploy_agent.py
            Response without json Content-Type
            **Unexpected answer from Checkmk server: Missing json data. Maybe we are talking to an agent bakery from before Checkmk 2.0 ?** 
            See syslog or Logfile at /var/lib/check_mk_agent/cmk-update-agent.log for details.

After the patch, we started to receive a new message (identified above in bold). I searched the forum for the problem and found a solution which was to remove the host in the check-mk console and uninstall and reinstall the agent. This procedure solved the problem for a while, but it reappeared when I ran cmk-update-agent -vv a second time.

Another thing I noticed was that even with a fresh install of the agent without certificate validation configured in the auto-update rule, cmk-update-agent would still fail due to certificate validation:

Hello @mschlenker

Have you had a chance to see this problem? We can proceed with the certificate comparison step.

Thanks

I am waiting for input from someone who internally ran in the same problem. I’ll ask them on Monday how they proceeded.

Hello @mschlenker

Any news?

Thanks

Hi @JoaoCampos !

It’s hard to tell without some further knowledge about your setup.

As it’s working with openssl/curl, there seems to be a root certificate available on the system that can be used to verify the certificate chain of the Checkmk server.
It’s hard to tell where exactly it comes from.

However, the certificate rolled out by the agent updater ruleset seems to be insufficient.

You can analyze this by having a look at the server’s certificate chain. You already entered the right command with openssl s_client -connect SLAVE01:443
The interesting part here is the certificate chain, that comes right after the listing that you posted.

Now, this chain has to match the certificate rolled out with the agent updater rule.
Assuming that the certificate is available as file certificate.pem, please

  • have a look at it with
    openssl x509 -in certificate.pem -text
    The most interesting parts here are Issuer and Subject.
    They should be the same, i.e., it should be a root certificate.
    And this issuer/subject should match one subject (s:) of one entry of the server’s certificate chain.
    (In many cases the last one)
  • test if this certificate can verify the chain:
    openssl s_client -connect SLAVE01:443 -CAfile certificate.pem