BUG: check_mk-brocade_vdx_status

Hi,

This check with OID “1.3.6.1.4.1.1588.2.1.1.1.1.7” is ok, but he show only one part of truth.
The stor2rrd Monitoring Software show in his GUI that the state isn’t ok. I think this software is searching deeper and can find, that one or more network connections was disconnected for a short while.

In the GUI from the stor2rrd GUI the same state is not ok but die Integer from 1.3.6.1.4.1.1588.2.1.1.1.1.7 isn`t changing. So CheckMK is unable to check the mistake but stor2rrd is able to check it right.

I down`t now why stor2rrd can detect the mistake, but I know CheckMK can not find it.

kindly regards
Thomas

Can you provide a little bit more information. Like your output compared to the output from stor2rrd?
The mentioned OID is the operational status of the complete switch.
If there is something inside an error log or so it will not affect the operational state of the switch.

This are the event log entries.

2021/11/16-15:18:55, [EM-1034], 124, CHASSIS, ERROR, DS_7720B, PS 2 set to faulty, rc=2000e.

2021/11/16-15:19:10, [EM-1037], 125, CHASSIS, INFO, DS_7720B, PS 2 is Ok.

2021/11/16-15:19:16, [MAPS-1021], 126, FID 128, WARNING, mySWITCHNAME, RuleName=defCHASSISBAD_FAN_MARG, Condition=CHASSIS(BAD_FAN>=1), Obj:Chassis [ BAD_FAN,1] has contributed to switch status MARGINAL.

2021/11/16-15:19:16, [MAPS-1020], 127, FID 128, WARNING, mySWITCHNAME, Switch wide status has changed from HEALTHY to MARGINAL.

2021/11/16-15:19:27, [EM-1034], 128, CHASSIS, ERROR, DS_7720B, PS 1 set to faulty, rc=2000e.

2021/11/16-15:19:37, [EM-1037], 129, CHASSIS, INFO, DS_7720B, PS 1 is Ok.

2021/11/16-15:20:16, [MAPS-1021], 130, FID 128, WARNING, mySWITCHNAME, RuleName=defCHASSISBAD_FAN_CRIT, Condition=CHASSIS(BAD_FAN>=2), Obj:Chassis [ BAD_FAN,2] has contributed to switch status CRITICAL.

2021/11/16-15:20:16, [MAPS-1020], 131, FID 128, WARNING, mySWITCHNAME, Switch wide status has changed from MARGINAL to CRITICAL.

I can imagine how stor2rrd can detect this kind of alerts. Look at the PDF Dokument from Brocade in Chapter 4.8

That’s all health state messages but not the operational state.
The problem is very easy to describe there are no checks at the moment for the sensor states on a Brocade VDX, only MLX devices are known by CMK.
You don’t need to parse the event messages it would be enough to have a check what’s reading the OID table under .1.3.6.1.4.1.1588.2.1.1.1.1.22. This is the sensor table with all temperature, fan and power sensors.
Inside this table you will find your defect parts. But as i said it is not available as an check at the moment. So the title should be more like “Feature request for sensor monitoring on Brocade VDX devices”. The “brocade_vdx_status” has no bug as it only shows the operational state and not the health state of the switch.

1 Like