Weird spikes on interfaces

I am still unable to reproduce such spikes in my lab.
Even not with turning on and off the periodic service discovery.

But I now extended the checkmk interface check lib/python3/cmk/base/plugins/agent_based/utils/interfaces.py to log what it got as raw data and what it calculated out of that.

Please send me a private message if you are interested to uses this to debug your interface spike phenomenas.

Some background:

The bandwidth rate is calculated using the previous value, and it’s timestamp from the value store.

The modified interfaces.py will create a folder structure interface_logs/yyy/mm/dd//<interface_fs_save>.log and log a json line one each check intervall

OMD[central]:~$ tail -n1 interface_logs/2023/08/16/myhost.mycompany.tld/tun0.log | jq ''
{
  "timestamp": 1692195397.2286897,
  "host_name": "myhost.mycompany.tld",
  "item": "tun0",
  "in_octets": 136673920,    <--- raw rx counter data
  "out_octets": 24733351,   <--- raw tx counter data
  "value_store": {
    "outtraffic.None": [
      1692195397.2286897,
      24733351
    ],  
    "intraffic.None": [
      1692195397.2286897,
      136673920
    ]   
  },  
  "rates_dict": {
    "intraffic": 2037.9487204080037,
    "outtraffic": 750.7709995843549
  }
}
  • Storing this data helps to do a back in time analysis.
  • Logs are stored in ~/interface_log/yyyy/mm/dd folders to allow you to easily remove old data from by a cronjob:
echo "0 6 * * * find ~/interface_logs/ -type f -mtime +4 -delete" > ~/etc/cron.d/custom_interface_logs
omd restart crontab