Cached Agent Data Error

2.1.0p25CRE:
CentOS7:

This service is based on cached agent data and cannot be rescheduled. Cache generated 53 y ago, cache interval: 0.00 s

OK I have a bizarre issue I am having issues pinning down.

I have a distributed monitoring topology, and ALL hosts on one poller are getting the above error when trying to reschedule checks, currently the checks on every host show “1970-01-01 02:00:00” as the time of their last check.

I took 1 host from this poller, moved it to another poller, rediscovered checks and confirmed I can indeed properly reschedule checks, but after moving it back I get same error (Note it says the cache is 53 years old, so something is seeing a default 1970 date)

Further I have confirmed the issue seems to be related to my central server, if I log onto the affected poller directly I have confirmed I have the ability to properly reschedule checks.

On both the poller, and master server I have done a find with an mtime modifier of +10000 days (Enough to filter out almost all false positives) and have confirmed there isn’t any actual file with a Jan 1 1970 mtime, so I am assuming there is a cache file with errant data actually inside it on my master server? But I am struggling to find it (Or simply force a cache refresh via the OMD CLI or GUI)

You might want to check the time synchronization of the remote site. Looks like it is way out of time.

Literally at a loss here… This was one of first things I checked, as it seemed obvious… Time looked right on the server though.

I checked again this AM just to ensure I didn’t imagine checking it, and time between the poller having issues and the rest of the estate was off by exactly 1 hour (I am researching why exactly, for now I just set it manually) I have confirmed this fixed the issue.

I am still somewhat confused by the way the issue presented, the date stamp I think makes sense, I assume that’s just what is added by default when a host is added, and it gets updated on the first polling cycle, which due to the time drift was failing? But what makes this confusing is the hosts were all updating, graphs had current data, dates on graphs was accurate, the last check just was stuck in 1970 and you could not manually reschedule a check.

But in the event someone in the future finds this thread, the time between the servers being off definitely caused this (I was just looking for a MUCH wider gap in time than existed so I missed it until I looked at the 2 servers side by side!)

So the solution was literally the first thing I checked, I just missed it which is a bit of a stinger!

Thanks Robin!!

Glad I could help!

Are you sure it was just “one hour” and not “one hour and several years”? :upside_down_face:

Yep, exactly one hour, I goofed something up in the NTP config so time was synced but one was not set to adjust for DST…

Literally the dumbest thing it could have been, and I totally missed it on the first pass!

I’m surprised a bunch more stuff did not break as well honestly…

I know the feeling. It is almost always the easiest answer and you either forget to check for it, or you overlook it. Been there, done that.

Take care, friend! :vulcan_salute:

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.