NTP Time service check problems after 1.6 to 2.0.0p12.cee upgrade

I’m having a number of problems with NTP Time after an upgrade. Hosts running chrony appear to be fine, and the new timesyncd support is nice, but I have a number of hosts running ntpsec (fork of ntpd with some serious security & performance cleanup which is officially supported by some distros, such as debian).

Problem 1 was the belief expressed in check_mk_agent that if systemd is in charge of things, ntp.service and ntpd.service are the only possibilities. Adding ntpsec.service to that code fixed things to the extent that if I manually run check_mk_agent on the client machine, the /var/lib/check_mk_agent/cache/ntp.cache file gets updated and the data shows up in check_mk.

Problem 2 is the fact that while manually running check_mk_agent on the client updates the cache, it does not get updated when called over port 6556 by my check_mk server. Best I can tell, the cached data is returned everytime, but the contents of the cache are never refreshed. This may be related to…

Problem 3 is that discovery for NTP Time does not appear to be working for my clients that run ntpsec. Right now I’ve setup Enforced services → State of NTP time synchronisation → ntp_time for the servers in question.

Suggestions for where to look next?

Thanks.

As an example, when I run

OMD[corp1760]:~$ cmk -d moldycrow.bo.radicalconvergence.com | grep ntp.cached
<<<ntp:cached(1634160172,30)>>>

Multiple times I’d expect that timestamp to increment every >= 30 seconds. It doesn’t change.

I now have found one, but only one, of the many VMs I have running chrony which exhibits problem #2. If I run check_mk_agent at the CLI of the client, chrony.cache gets updated. If I execute the agent from the check_mk server, new data never gets cached.

This issue (check_mk_agent started via systemd socket unit and not updating cache files) is currently under investigation. We experienced that too and opened a developer ticket.

Ah ha!

Thanks for the feedback. Apparently this problem has uncovered 7 devices that didn’t actually follow the local standard of using xinetd, instead of a systemd managed socket, and do run ntpsec or chrony.

One of the problematic devices started updating the cache file as soon as I switched over to using xinetd.

Now to figure out why /etc/xinetd.d/check-mk-agent was never installed on the 6 other, debian, devices.

I think new agents use systemd not xinetd. I tried just now to download and install agent from my site. Result is no xinetd was installed only symlink to systemd :

Interesting…

Looking around it looks like the xinetd support was ripped out on some, but not all of the devices where I recently upgraded the agents from 1.6 to 2.0. In 1.6 it was the case that systemd sockets were configured if and only if xinetd was not installed. I actually have an internal document I wrote to tell people how to recover if they installed the agent first and xinetd second.

If this was a deliberate change, I find it a bit odd, as the documented preference from tribe29 at 1.6 was to use xinetd, and we did indeed find doing so considerably more stable than using systemd sockets. Is there any documentation on what has changed?

Thanks!

So i found this in official check_mk docs

Yup, that’s what I used as my “marching orders” in 1.6. Note that, unfortunately, the last update of that page was on 23-Apr-2018, and 2.0.0 updates (if any) aren’t there yet. I should have been clearer: Has anybody seen documentation of deliberate changes to this in 2.0? I’ve not uncovered anything.

So I tested xinetd/systemd socket use on a nice, clean container starting from a clean configuration.

Tests with Ubuntu 20.04.3 LTS

No xinetd installed, install agent 1.6.0p25 -> systemd socket with warning

xinetd installed, installed agent 1.6.0p25 -> xinetd

xinetd installed, installed agent 1.6.0p25 and installed 2.0.0p12 over first -> converted to systemd socket, config file removed from /etc/xinetd.d

xinetd installed, installed agent 2.0.0p12 -> systemd socket

So it looks like this is deliberate and I’m a bit sad, as it obviously has some problems that I can force to go away by manually reverting to using xinetd with the 2.0.0p12 agent.

I’ll quietly ignore that on some of my production hosts the xinetd configuration wasn’t actually fully cleaned up…

At least I’m no longer sitting here thinking I was careless and left that many hosts without properly setup xinetd.

Thanks everyone for your input.

Thanks for figuring it out so we need to wait when/if check_mk will figure out why there is problem with NTP using systemd socket