Cisco SG High CPU Usage

Hi all, I use CheckMK since few years now. I started with 1.5 and now I upgraded to 2.0, but I always have a problem with Cisco SG switches.
When I activate SNMP monitoring of interfaces I always have High CPU usage (90-95%) on the switch, no matter how many interfaces are under monitoring (I started with 2x48 ports in stack, and ended with only two interfaces monitored). The SNMP thread lasts forever (around 50-55 secs), so with 1min monitoring (the default) the switch is always under monitor (and under stress), degrading performances for services.
I tried to lower SNMP checks from 1min to 5 min, but when the monitor is active the CPU spikes for quite 1 minute…I also tried to remove checks for interfaces (using rules), and leaving only states (up/down) and bandwith summary (even if I would like to se errors too), but the problem remains. I monitor also a small ASA5506 firewall, (only 3 interfaces), but here no issues: the check last less than one second.
When I monitor the same switch with Nagios, using custom SNMP queries for graphing interfaces, I have no spikes in the CPU, and averything is fine, so I suspect that check-mk is doing something else with interfaces (maybe a snmpwalk) behind the scenes…
Has anyone experienced this issue?
Thank you.
Max

Checkmk always uses snmpwalk or snmpbulkwalk to query all elements of an OID (e.g. all interfaces). Otherwise it would not be able to do the automatic service discovery.

Unfortunately there are vendors out there who do not implement their SNMP agents efficiently wrt to SNMP walks.

You could use the classic Nagios plugin to monitor single Interfaces.

Thank you for the hint: I thought that, once discovered, all active checks were done using current information, and the discover was managed by a separate task. I agree with you that SG SNMP engine is not well implemented, but I think that it would be more efficient to use single snmpget on configured interfaces instead of a snmpwalk on everything: this should be ok for the first discover, not for every check every minute.
I will look at the classic Nagios plugin, at least for those switches (unfortunately I have a lot of them at my customer sites…).
Thank yoou again.
Max

From a pure protocol point of view a walk is far more efficient than getting all values with single requests. It saves on necessary packets and saves round trip time. And a properly implemented SNMP agent also has less to do.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact @fayepal if you think this should be re-opened.