OMD Performance check CRIT for Site certificate

CMK version: 2.4.0p21.cme
OS version: RHEL 9.5 (Plow)

Error message: OMD Performance critical for ‘Site certificate valid until 2026-02-16, Expired 10 hours 20 minutes ago”

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)

<<<livestatus_ssl_certs:sep(124)>>>
[sitename]
/omd/sites/sitename/etc/ssl/ca.pem|1771286400
/omd/sites/sitename/etc/ssl/sites/sitename.pem|1771286400

It appears to be the same/similar issue as in this post which werk 17903 is cited as the fix in 2.4.0p1 for this. The epoch date looks to translate to the expiration mentioned.

Extenuating circumstances: This started immediately after an upgrade from cme 2.3.0p38 to p42 and then updating to 2.4.0p21. The distributed nodes being monitored and reporting this issue are still 2.3.0p38 (customer cluster with multiple distributed nodes) but running the CheckMK Agent 2.4.0p21 for monitoring node/server performance in our corporate cluster.

The distributed nodes accepted/trusted certs were set to expire far in the future (as seen in global settings > Trusted certificates). So as a test, I went to the customer CheckMK webui on the central site, deleted all trusted certs and then re-accepted. No change.

Manually testing using openssl shows the cert expiration 10 years into the future.

$ sudo openssl x509 -noout -dates -in /omd/sites/<nodename>/etc/ssl/sites/<nodename>.pem
notBefore=Nov 15 22:27:49 2024 GMT
notAfter=Nov 15 22:27:49 2034 GMT

I will be performing an update of the customer portal site(s) tomorrow night so it might self correct if the CheckMK Agent version and CheckMK Server running version difference might be causing a problem.

Posting here to see if this makes sense or if I should gather more information in case this is a regression.

Thanks in advance,

Scotsie

1 Like

The issue will likely resolve after you habe updated the remote sites to 2.4.
From 2.4, the agent section is generated by /bin/cmk-monitor-core which is a utility managed by the site.
In 2.3, the agent generated that section itself but to allow for non-root agent mode we moved some of the things into separate binaries.

1 Like

Hey Martin,

Thank you for the confirmation. I did the update the customer cluster nodes last night in a maintenance window. Afterwards the alarm(s) did indeed clear.

And in a show of supreme confidence for longevity, the cert expiration updated to “Site certificate valid until 3021-10-15, Expiring in: 996 years 113 days”.

Sincerely,

Scotsie