[BUG] 1.6.0p19 only - LNX Bonding Interface Check - No active interface

CMK version : 1.6.0p19 Raw
OS version : Oracle Linux 8.2

We have some Bonding Interfaces and with CheckMK Version 1.6.0p19 these Checks are alarming.
for LACP: CRIT - No active interface**CRIT**, Mode: IEEE 802.3ad Dynamic link aggregation, eth2/ec:f4:bb:c8:db:8a up, eth1/ec:f4:bb:c8:db:88 up, Bond status: up
and for Loadbalancing: CRIT - No active interfaceCRIT, Mode: load balancing, eth3/ec:f4:bb:dc:7b:93 up, eth2/ec:f4:bb:dc:7b:92 up, eth1/ec:f4:bb:dc:7b:91 up, Bond status: up

full output for LACP

<<<lnx_bonding:sep(58)>>>
==> ./bond0 <==
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: ec:f4:bb:c8:db:88
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 15
Partner Key: 32869
Partner Mac Address: 00:23:04:ee:be:61

Slave Interface: eth1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: ec:f4:bb:c8:db:88
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: ec:f4:bb:c8:db:88
port key: 15
port priority: 255
port number: 1
port state: 61
details partner lacp pdu:
system priority: 32667
system mac address: 00:23:04:ee:be:61
oper key: 32869
port priority: 32768
port number: 16641
port state: 61

Slave Interface: eth2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: ec:f4:bb:c8:db:8a
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: ec:f4:bb:c8:db:88
port key: 15
port priority: 255
port number: 2
port state: 61
details partner lacp pdu:
system priority: 32667
system mac address: 00:23:04:ee:be:61
oper key: 32869
port priority: 32768
port number: 257
port state: 61

full output for Loadbalancing

<<<lnx_bonding:sep(58)>>>
==> ./bond0 <==
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 1000
Up Delay (ms): 2000
Down Delay (ms): 2000

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 10
Permanent HW addr: ec:f4:bb:dc:7b:91
Slave queue ID: 0

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 11
Permanent HW addr: ec:f4:bb:dc:7b:92
Slave queue ID: 0

Slave Interface: eth3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 11
Permanent HW addr: ec:f4:bb:dc:7b:93
Slave queue ID: 0

We didn’t set and parameters for this check. And both bonding interfaces are working fine.

According to the checkmk internal changelog there was:

Werk #11543 - Wrong handling of ‘None’ values in bonding checks

But I can’t find this on https://checkmk.com/check_mk-werks.php?edition_id=raw&branch=1.6.0

How can I get rid of this alarm?

2 Likes

Hi,
i’ve the same problem with all of my servers (OS: Ubuntu 20.04, Ubuntu 18.04, Debian 9)
Hope that there is a fast solution!

Was it working before p19? To test i have only a p18 here and there is no problem at the moment.

I think i found the reason. The mentioned werk destroyed the Linux bonding as it has it’s own parse function and don’t used the generic parse function for bonding interfaces.

The Problem is the following commit.

To make a quick fix you can take the file “checks/bonding.include” from a p18 version and copy this file to the folder “~/local/share/check_mk/checks/” now it should work again. If this is fixed in p20 you can then remove the manual copied file.

2 Likes

Hi,
it was working in p17.

Hi there - sorry for the inconvenience with that false positive behavior.

Werk 11543 fixes a “false negative” behavior by handling None and "None". Unfortunately it also introduces the false assumption that no active interface should be treated as CRIT.

This will be fixed in p20 - in the meanwhile you can workaround this behavior by copying checks/bonding.include to ~/local/share/check_mk/checks/ (like suggested by Andreas, but taking the p19 version) and changing yield 2, "No active interface" in line into yield 0, "No active interface" in line 67.

hth, Frans

2 Likes

Thanks for the answer, i had no time to dig deeper into the problem :slight_smile:

Thanks, the copy workaround is working, looking forward to p20. :slight_smile:

Hi. I have the same issue. The workaround works only partially as all bonding interfaces are flapping between OK and CRIT every few minutes.

We have to wait for 1.6.0p20 to resolve this issue.