Consistent NTP Time Offset Issues Across Multiple Linux Clients

CMK version: Checkmk Enterprise Edition 2.3.0p7
OS version: Debian 12

Error message: Offset: -10916.0000 ms (warn/crit below -1600.0000 ms/-1800.0000 ms)CRIT, Stratum: 1, Jitter: 0.0710 ms, Time since last sync: 6 minutes 35 seconds

Hello,

We’ve been observing recurring NTP time offset errors on several hosts for some time now. Despite various attempts to reconfigure the NTP servers (On CheckMK Server and clients) —along with other troubleshooting steps—the issue persists.

Interestingly, the time offset appears to occur at consistent times per client. For example, on client X the issue consistently happens between 04:30 and 05:30, while on client Y it occurs between 19:00 and 20:00, and so on.

There’s no clear trigger such as a system update or configuration change that correlates with the onset of the problem. Moreover, the issue does not affect all Linux clients and has been observed on both Red Hat and Debian-based systems.

We would appreciate any insights or suggestions you might have to help resolve this issue.

I have something similar.
Problem I see is with the RAW and Enterprise edition:

Offset: 0.1708 ms, Stratum: 2, Time since last sync: 17 minutes 58 seconds (warn/crit at 17 minutes 5 seconds/1 hour 0 minutes)WARN

However, it still says Stratum 2, when the Chrony service is suppose to be using Stratum 1 servers, which makes this message rather strange. Also, it feels like the 0.1708ms should be perfectly fine, but it still triggers… ?
This is with all hosts so far that has Chrony installed (some are on Proxmox LXC, and have no access to change NTP, so no need to use it there).

In your case the problem or warning message is completely different from this.

The original post shows a real NTP problem. If the affected machines are VMs then it can happen at time of backup or snapshot creation.

@Power2All your message can be fixed with an higher value for the time between syncs. 17 minutes for warning is too low on some systems. Inside the graph of this service you see how long your machine waits between the sync. I would then set the threshold to an value a little bit above what you see in the graph.

1 Like

Yes, that was it for me.
Thanks for your response, my company I work for had the same issue as me.
At least I can give them the solution :slight_smile:

Sadly, I’ve tested this out.
I changed the time to 30 minutes instead of the default, but still is an issue.

Offset: 0.0054 ms, Stratum: 2, Time since last sync: 30 minutes 5 seconds (warn/crit at 30 minutes 0 seconds/1 hour 0 minutes)WARN

But it’s weird, as it’s showing 0.0054ms, which is not even “milliseconds”, and shouldn’t trigger a warning. After asking also claude.ai if the chrony configuration does update every 15 to 18 minutes, it says it should.

So I’m wondering if this is a shortcoming of CheckMK, or purely another configuration issue ?

Here is chrony’s config to be sure (ignore the _ before the DNS name, because forum wouldn’t allow users to post “links”):

server _0.nl.pool.ntp.org iburst prefer
server _1.nl.pool.ntp.org iburst
server _2.nl.pool.ntp.org iburst
server _3.nl.pool.ntp.org iburst
server _time.cloudflare.com iburst
server _time1.google.com iburst
server _time2.google.com iburst

keyfile /etc/chrony/chrony.keys
driftfile /var/lib/chrony/chrony.drift

log tracking measurements statistics
logdir /var/log/chrony

maxupdateskew 100.0

rtcsync

makestep 1.0 3

dumpdir /var/lib/chrony
dumponexit

leapsectz right/UTC

ratelimit interval 3 burst 8

clientloglimit 100000

cmdallow 127.0.0.1
bindcmdaddress 127.0.0.1

stratumweight 0.05

The warning is only for the time between updates not for the time difference.
In your case it takes longer than 30 minutes for an update.
I have some servers where it takes around 30 minutes for the next update.

The time your server needs between the update you only see on the “Systemd Timesyncd Time”. The chrony and ntpd service only show the time difference in graphs.

Mmm I see.
What time would you say is good for warning and crit in this case ?
Still trying to figure out how CheckMK works, so I’m still quiet new to this.
Zabbix didn’t had this before, but I need to use CheckMK cause of this new working place I’m at :wink: