An update on my investigations about some users seeing spikes:
-
one user found out that his spikes correlate with the checkmk server running on 100% CPU usage at that time the spike occurred.
He assigned more CPUs to the machine and is now fine. -
Two users provided debug log data that shows that the devices (cisco and juniper) really reported (wrong) high in and out octet counters.
Looks like there are buggy switch/router devices or at least buggy firmwares out there that report wrong in/out octets from time to time:
- We have a lot of Partners that have customers with thousands of Ports/Interfaces that never see/saw such spikes.
I do not think any more that there is a problem with checkmk.
Perhaps if you upgrade your switch/router to latest firmware you are fine.