File ca-certificates.crt on the master site gets empty while activating changes

CEE 2.0.0.p37
Debian Linux 11.7 (5.10.179.2)

I have a master and 8 slave sites. The master site replicates the configuration to the slaves and the communication between them is encrypted. The setting “Trusted certificate authorities for SSL” is factory default (“Use system wide CA”). It was working many months without issues.

I have faced the issue since few days that the file in ~/var/ssl/ca-certificates.crt on the master gets 0 byte after I activate changes. It looks like the file gets truncated or overwritten and is empty. This causes connections to the slaves to fail because they cannot be verified due to the lack of the certificates. If I manually add the appropriate certificates to the empty file and re-run the activation, the activation works again till next time.

I do not know what is the root cause of the issue. I neither can see why the file gets empty. The permissions are OK. An update from 2.0.0p35 to 2.0.0p37 didn’t improve anything.
The partition has lots of free space.
The file ~/var/log/web.log contains many SSL-related errors that are generated at the activation of changes because of the failed certificate verification but no hint why the file ca-certificates.crt gets empty.

My workaround as of now: removed the write permission for the user of the master site on the directory ~/var/ssl. The activation process displays a warning that the file could not be written, but everything else works fine.

Any ideas? Any advices?

Best regards,
Hermann M.

The update to 2.2.0p7 that we installed today didn’t resolve the issue.

Does anyone have an idea how to get rid of the issue?
BR
Hermann M.

Hey,

we actually do face the same issues. Since we’ve moved our Checkmk instances from version 1.5 → 1.6 → 2.0 → 2.1 → 2.2.0p7. In version 2.2.0p7 we did enable the TLS option in our distributed sites configuration. Its working initially, but the file ca-certificates.crt gets empty after activating other changes. So we did the same with removing write permissions.

1 Like

Checking with

inotifywait receving the following result:

inotifywait -m /omd/sites/<sitename>/var/ssl/ca-certificates.crt
Setting up watches.
Watches established.
/omd/sites/<sitename>/var/ssl/ca-certificates.crt OPEN
/omd/sites/<sitename>/var/ssl/ca-certificates.crt ACCESS
/omd/sites/<sitename>/var/ssl/ca-certificates.crt ACCESS
/omd/sites/<sitename>/var/ssl/ca-certificates.crt CLOSE_NOWRITE,CLOSE
/omd/sites/<sitename>/var/ssl/ca-certificates.crt OPEN
/omd/sites/<sitename>/var/ssl/ca-certificates.crt ACCESS
/omd/sites/<sitename>/var/ssl/ca-certificates.crt ACCESS
/omd/sites/<sitename>/var/ssl/ca-certificates.crt CLOSE_NOWRITE,CLOSE
/omd/sites/<sitename>/var/ssl/ca-certificates.crt OPEN
/omd/sites/<sitename>/var/ssl/ca-certificates.crt ACCESS
/omd/sites/<sitename>/var/ssl/ca-certificates.crt ACCESS
/omd/sites/<sitename>/var/ssl/ca-certificates.crt ACCESS
/omd/sites/<sitename>/var/ssl/ca-certificates.crt CLOSE_NOWRITE,CLOSE
/omd/sites/<sitename>/var/ssl/ca-certificates.crt OPEN
/omd/sites/<sitename>/var/ssl/ca-certificates.crt CLOSE_NOWRITE,CLOSE
/omd/sites/<sitename>/var/ssl/ca-certificates.crt ATTRIB
/omd/sites/<sitename>/var/ssl/ca-certificates.crt CLOSE_NOWRITE,CLOSE
/omd/sites/<sitename>/var/ssl/ca-certificates.crt DELETE_SELF

Which seems to be a result of apache replacing the file

time->Tue Sep 19 08:10:55 2023
type=PROCTITLE msg=audit(1695103855.141:240262): proctitle=2F7573722F7362696E2F6874747064002D66002F6F6D642F73697465732F76696572636F6D2F6574632F6170616368652F6170616368652E636F6E66
type=PATH msg=audit(1695103855.141:240262): item=4 name="/omd/sites/<sitename>/var/ssl/ca-certificates.crt" inode=145986946 dev=fd:07 mode=0100660 ouid=994 ogid=1023 rdev=00:00 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(1695103855.141:240262): item=3 name="/omd/sites/<sitename>/var/ssl/ca-certificates.crt" inode=145986947 dev=fd:07 mode=0100660 ouid=994 ogid=1023 rdev=00:00 nametype=DELETE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(1695103855.141:240262): item=2 name="/omd/sites/<sitename>/var/ssl/.ca-certificates.crt.news23o6d17" inode=145986946 dev=fd:07 mode=0100660 ouid=994 ogid=1023 rdev=00:00 nametype=DELETE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(1695103855.141:240262): item=1 name="/omd/sites/<sitename>/var/ssl/" inode=145986996 dev=fd:07 mode=040755 ouid=994 ogid=1023 rdev=00:00 nametype=PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(1695103855.141:240262): item=0 name="/omd/sites/<sitename>/var/ssl/" inode=145986996 dev=fd:07 mode=040755 ouid=994 ogid=1023 rdev=00:00 nametype=PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1695103855.141:240262): cwd="/"
type=SYSCALL msg=audit(1695103855.141:240262): arch=c000003e syscall=82 success=yes exit=0 a0=7f5f9dddf230 a1=7f5f65b53250 a2=7f5fbafd3cb8 a3=7f5fad0158a6 items=5 ppid=28479 pid=4165730 auid=0 uid=994 gid=1023 euid=994 suid=994 fsuid=994 egid=1023 sgid=1023 fsgid=1023 tty=(none) ses=1 comm="httpd" exe="/usr/sbin/httpd" key="cert_watch"

We strangely also are having issues with the Teams Notification:

requests.exceptions.SSLError: HTTPSConnectionPool(host='<companyname>.webhook.office.com', port=443): Max retries exceeded with url: /webhookb2/55af4a75-0c3f-4bfc-88d7-43b93e3d3efb@0e603135-2ea1-4694-89f4-5c1e8703c2d4/IncomingWebhook/aa6f927aa7044d728b8a501c637be493/765ae110-4de5-46ec-83ed-2b5ad45897d3 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)')))

Or having issues submitting the license information

Last verification failed (Mode: Manual online verification, Date: 2023-09-12 15:25:32):
[Error] Connection with licensing server (https://license.checkmk.com/api/verify) failed. You need to make sure that your Checkmk can reach the license server. Please check your firewall and proxy settings.

Details: HTTPSConnectionPool(host='license.checkmk.com', port=443): Max retries exceeded with url: /api/verify (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)')))

We are using the following OS:

cat /etc/os-release
NAME="Rocky Linux"
VERSION="8.8 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.8 (Green Obsidian)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2029-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-8"
ROCKY_SUPPORT_PRODUCT_VERSION="8.8"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.8"

We attempted to relocate the installation.

→ Executed an encrypted backup.

We established a new host using Rocky 9 with the CRD repository enabled (according to the setup guide) and installed Checkmk version 2.2.0p9.

While the restoration from the backup was successful, the issues persist.

Has no one any idea on this topic? We’re pretty stuck here and have no clue on how to continue from this point.

Also tested with 2.2.0p9 (CEE) on Debian 12, same issue (restoring backup from master site).

We had to disable TLS for the site connections, in order to prevent disconnections between the sites which are happening after some time by removing write permissions at ~/var/ssl.

Still affected by this problem.

Is there a way to export just certain things from Checkmk, like the checks and hosts (instead of everything with omd backup / restore)?

Bumping thread…

Can someone assist here?

What is the global setting for “Trusted certificate authorities for SSL” , and what is the central site specific setting here?

1 Like

Global Settings the option “Trust system wide configured CAs” is checked (true).

Site specific (master site) is checked (true) as well.

We have “Trust system wide configured CAs” ON. But we have already tested setting it to “OFF” and enabling the setting “Checmk specific”. No change.
The point is that CMK trusts in CA, but truncates the file…

Same behavior after updating to CMK 2.2.0p11.cee :frowning:

We have “Trust system wide configured CAs” ON. But we have already tested setting it to “OFF” and enabling the setting “Checmk specific”. No change.
The point is that CMK trusts in CA, but truncates the file…

We did the same from our end. But no matter what we are trying to setup, the behavior of truncating the file (and keeping it empty afterwards) remains the same.

We are assuming that the upgrade from 1.5/1.6 to 2.X did mess up something with the python setup for the master site (I think there was an upgrade from python2.x to python3.x). Unfortunately, a backup and restore, seems to restore the python libs as well.

Just creating a new master site is also not possible with ~2000 hosts / ~90000 services. If there is a way to extract just this data (hosts and checks), I would be also happy with this solution. Or someone could explain why we are facing this issue & how to solve it.

We will try to update all sites to 2.2.0p11.cee as well soon. I will provide an update afterwards once done…

Same behavior after updating to CMK 2.2.0p12.cee

Is it also possible for you to open a ticket with us and provide the support diagnostics dump?
We have to look at Checkmk logs (maybe we have to raise the log_level) + the underlying OS logs.
On a fresh 2.1.0p11.cee and Even with updating the site from 2.0.0p38 > 2.1.0p34 > 2.2.0p11 , I can’t reproduce this.

Ok, let’s try this. What data should I include in the diagnostics dump? And what exactly should I do with it?

Detailed instructions on how to create such a dump are available in our official guide:

As a starting point, please check the following boxes, when creating the dump and send the file:

  • Local Files
  • OMD Config
  • Checkmk Overview
  • Checkmk Log files

As soon as the ticket is in, we can discuss it there and post the final outcome here.

does ticket mean sending an email to feedback@checkmk.com? hopefully not, because the dump contains quite sensitive data…
Or will it be generated automatically when I submit the diagnostics data?