I cannot find any errors in pfsense, in any logs, there are no counters going up, there is no trace of these that I can find. We also monitor SNMP and IPMI and nothing reports any errors.
I suspect there’s a bug somewhere monitoring LAGG (bond, link aggregation).
How can I troubleshoot this and find out whether I can safely up the tresholds / ignore these?
this are errors in your network checkmk just reports. This is not a bug but a situation you need to analyse in your network and the devices included. In-Error means your device is getting TCP packages which are corrupted and thrown away and are resent by the source. This is mostly a sign for a deeper network problem, a broken cable, a dirty fiber optic cable, a broken network card or just buggy firmware.
The point is, pfsense (which collects a lot of stats of itself) is not reporting any errors that I can see. Neither is the switch everything is connected to. If checkmk is right, why isn’t it visible elsewhere?
As you may have noticed, the errors only happen on igb1 and igb2 which are part of an aggregated link (lacp bond). The WAN interface which is a single interface doesn’t report any errors. Same drivers, same network interface hardware, same switch.
Furthermore, lagg0 is only used as a host for about 50 vlan interfaces. From the checkmk errors I cannot see what the source is so I think its virtually impossible to figure this out by doing catch-all packet captures.
I am not a network admin and i don’t know the devices and tools you are mentioning.
I am just explaining the meaning of your post. Checkmk just gathers this information via SNMP (or via another way) and reports it, there is no magic inside checkmk where this information comes from. So in conclusion your device have to know about otherwise it won’t report this errors.
May take a look at the SNMP (or raw) data of your device and you will found this values there.
You may could find some further information if you take a look on the devices connected to this ports, any of this devices should have the corresponding counter parts as out-errors on the network interface.
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.