Checkmk Let's Encrypt CERT/HTTPS - Certificate chain verification failed

CMK version: Checkmk Enterprise Edition 2.3.0p25
OS version: Ubuntu 22.04.5 LTS

Error message:
I have a web application running on Azure in a VM, and I’ve configured some rules to monitor the certificate validity and ensure the website is reachable via HTTPS. The certificate is issued by Let’s Encrypt, and I’ve added it to Azure KeyVault for use with the Application Gateway.

Here are the monitoring rules I’ve configured:
CERT


HTTPS

However, I receive daily notifications from Checkmk with the following message:

Summary Certificate obtained in 59 ms, Certificate chain verification failed: self-signed certificate in certificate chain WARN , Certificate expires in 36 day(s) (Mar 21 01:05:45 2025 +00:00)

A few minutes to half an hour later, I receive an “OK” notification:

This happens every day for both the HTTPS and CERT rules.


The website is always accessible and the certificate appears valid when I check it, but I still receive these errors. Neither I nor any customers have experienced connectivity issues with the web application.

I’m wondering if Checkmk has any known issues with Let’s Encrypt certificates or if I might have missed a configuration. Could someone please guide me on how to resolve this?

Output of “cmk --debug -vvn hostname”:
The output of cmk --debug -vvn <*****> shows the same error as in the email: “Certificate chain verification failed: self-signed certificate in certificate chain.”

Thank you for your help!

I have many sites with LE certificates and have no issues.
Is it possible that there are some other devices (Firewalls) in between the CMK instance and the tested web service?
It would be strange if there is only SSL inspection from time to time but it could be possible.
If you receive the error message and check the chain with openssl from inside the CMK instance, what do you see there?

@andreas-doehler Sure, I completely agree with you. This behavior is indeed strange, and I haven’t been able to reproduce or explain why or how it happens.

I’ve verified the certificate chain on the machine hosting Checkmk. Here are the details:

:~$ openssl version
OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

:~$ openssl s_client -showcerts -connect my.site.com:443
# copy certificate to cert.pem
# copy chain certificate to chain.pem

:~$ openssl verify -CAfile chain.pem cert.pem
cert.pem: OK

The certificate and chain seem to be fine. Let me know if you have any other ideas.

Additionally, I’ve also checked some local web applications and others running behind an application gateway. With these apps, I don’t encounter any issues, even though some use Let’s Encrypt certificates, some are self-signed, and others are purchased certificates. Everything seems to work fine in those cases.

Can you try openssl s_client

Maybe also with “omd su sitename” ?

Also theres an interesting post about check_http2

For me check_http works with letsencrypt:

$ ./versions/2.3.0p24.cre/lib/nagios/plugins/check_http -H mail.example.com  -C 1,1 --sni
OK - Certificate 'mail.example.com' will expire on Tue Apr 22 18:26:44 2025 +0000.

ldd shows it using its own openssl /opt/omd/./versions/2.3.0p24.cre/lib/nagios/plugins/../../../lib/libssl.so.3

1 Like

@jochum, good input! I’ll wait until the issue occurs, then I’ll try using check_http and check_httpv2 to troubleshoot:

OMD[cmksite]:~$ ./lib/nagios/plugins/check_http -H site.example.com -C 1,1 --sni
OK - Certificate '*.example.com' will expire on Fri Mar 21 01:05:45 2025 +0000.

Additionally, I can confirm that Checkmk is using its own OpenSSL library:

OMD[cmksite]:~/lib$ ldd libssl.so.3
        linux-vdso.so.1 (0x00007ffca0bb8000)
        libstdc++.so.6 => /omd/sites/cmksite/lib/libstdc++.so.6 (0x00007fdb5c886000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fdb5c798000)
        libcrypto.so.3 => /omd/sites/cmksite/lib/libcrypto.so.3 (0x00007fdb5c345000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdb5c11c000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fdb5cb8f000)
        libgcc_s.so.1 => /omd/sites/cmksite/lib/libgcc_s.so.1 (0x00007fdb5c0f5000)
OMD[cmksite]:~/lib$ openssl version
OpenSSL 3.0.15 3 Sep 2024 (Library: OpenSSL 3.0.15 3 Sep 2024)

According to this Checkmk 2.4 change (Werk 15520), you’ll be able to ignore the certificate chain issue. However, I’m not fully satisfied with this solution because it merely bypasses the problem rather than solving it.

Edit: Here’s the output from openssl s_client when connecting to the site:

OMD[cmksite]:~$ openssl s_client -connect site.example.com:https
CONNECTED(00000003)
depth=2 C = US, O = Internet Security Research Group, CN = ISRG Root X1
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R11
verify return:1
depth=0 CN = *.example.com
verify return:1
---
Certificate chain
 0 s:CN = *.example.com
   i:C = US, O = Let's Encrypt, CN = R11
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Dec 21 01:05:46 2024 GMT; NotAfter: Mar 21 01:05:45 2025 GMT
 1 s:C = US, O = Let's Encrypt, CN = R11
   i:C = US, O = Internet Security Research Group, CN = ISRG Root X1
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Mar 13 00:00:00 2024 GMT; NotAfter: Mar 12 23:59:59 2027 GMT
---
1 Like

Can you tell me how you update that certificate? certbot, traefik, cert-manager?

I guess it’s a 3 hour temporary error in that software, thats whay I’m asking.

Maybe it’s even a BUG / Feature in your Azure Update script? :stuck_out_tongue:

EDIT: This is what I use to export traefik → kube GitHub - jochumdev/acmejson-to-secret: Docker container to transform Traefik acme.json to kube secrets (written by me).

1 Like

@jochum I understand, but I’m pretty sure that’s not the issue. Due to the problem, I’m manually updating the certificate. I use Certbot to generate the certificate, then create a PFX file with the key and fullchain file (since Microsoft only allows PFX files). This PFX file is then uploaded to Azure KeyVault. I’ve been uploading certificates for other websites without any issues, and this is the only one causing a problem. Azure KeyVault only allows uploads of valid certificates.

To double-check the certificate validity, I’ve also used/tested lego-acme.

So, there can’t be a bug in the Azure update script. I’m not sure what you mean by a " 3 hour temporary error in that software" — the certificate is managed by the Azure Application Gateway, not by the software or a script. (It’s all Microsoft bull***t :slightly_smiling_face:)

3 hour from your initial screenshot it show as !6! hour monitoring outage (was thinking 3 first).

Is the Checkmk server that you are using to monitor this app also on a Azure VM ?
Maybe you can create a simple local check (like using curl to check the endpoint) directly on Azure VM(where you are running your web application) and see if that fails as well?

Update! Update!! Update!!!

After analyzing the traffic and routing, I found the issue. Checkmk is working perfectly (very well), but I can’t say the same about my internet service provider. :worried:

The issue was resolved after I created an SD-WAN rule to route my traffic through another ISP, and since then everything has been working great.

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.