[Check_mk (english)] brocade_fcport's "no TX buffer credits"

Hello,

I'm getting a lot of WARNING/CRITICALs about ratio of "no tx credits" on
ports on Brocade switches. I suspect something is wrong in check_mk:

$ check_mk -n -v -p FC_SWITCH_HOSTNAME
....
Port 18 CRIT - assuming 2Gbit/s, In: 158.47KB/s, Out: 1.74MB/
s, no TX buffer credits: 90.51%(!!), Phy:inSync(6), Op:online(1),
Adm:online(1) (in=162273.698617;;;0;200000000
out=1822393.279242;;;0;200000000 rxframes=756.367023;;;;
txframes=1320.823425;;;; rxcrcs=0;;;; rxencoutframes=0;;;;
c3discards=0;;;; notxcredits=12604.370683;;;:wink:
....

And in the web interface (a bit later; so the numbers don't match
completely):
....
Service state: CRIT
Output of check plugin: CRIT - assuming 2Gbit/s, In: 228.34KB/s, Out:
950.65KB/s, no TX buffer credits: 96.31%CRIT, Phy:inSync(6), Op:online
(1), Adm:online(1)
Service performance data: in=233820.355617;;;0;200000000
out=973468.242159;;;0;200000000 rxframes=518.385831;;;;
txframes=753.48073;;;; rxcrcs=0;;;; rxencoutframes=0;;;; c3discards=0;;;;
notxcredits=19660.374087;;;;
Service check command: check_mk-brocade_fcport
....

Note from above output from check_mk:
rxframes = 756
txframes = +1321

···

----
sum of frames = 2077
                 ====

notxcredits = 12604

If this is to make sense, the port would almost have an order of
magnitude more instances of no tx credits compared to transmitted/
received frames; that's weird. Such a port would certainly be in a bad
shape, if the numbers are to be believed, I think. But the switch's
administration interface doesn't have anything bad to say about the port.
Things connected to the port seem to work well. And when I manually
snmpwalk the device and look at rx/tx counters and no_tx_credits counts,
I only see ratios below 1%.

I have started looking at the code for brocade_fcport, but I must say I
have trouble seeing how the peculiar numbers are calculated.

At this point, we have 13 FC ports in state CRITICAL and 6 FC ports on a
handful of different Brocade FC switches in state WARNING because of "no
TX buffer credits"; if this were really true, I would expect a lot of
derived operational problems - which doesn't seem to be the case.

Does someone have comments about "no tx credits" and the brocade_fcport
check?

--
Regards,
Troels Arvin <troels@arvin.dk>
http://troels.arvin.dk/

As follow-up:

At 12:26:
SW-MIB::swFCPortTxFrames.18 = Counter32: 3966693971
SW-MIB::swFCPortRxFrames.18 = Counter32: 2908513707
SW-MIB::swFCPortNoTxCredits.18 = Counter32: 72177326

At 15:50:
SW-MIB::swFCPortTxFrames.18 = Counter32: 4291732356
SW-MIB::swFCPortRxFrames.18 = Counter32: 2917664796
SW-MIB::swFCPortNoTxCredits.18 = Counter32: 74309947

Deltas:
SW-MIB::swFCPortTxFrames.18: 325038385
SW-MIB::swFCPortRxFrames.18: 9151089
SW-MIB::swFCPortNoTxCredits.18: 2132621

And still, I get a very red CRITICAL regarding "no TX buffer credits" for
port 18 almost constantly.

I wonder how to debug this further.

···

--
Regards,
Troels Arvin <troels@arvin.dk>
http://troels.arvin.dk/

Hello again,

···

On Thursday, January 12, I wrote

Deltas:
SW-MIB::swFCPortTxFrames.18: 325038385
SW-MIB::swFCPortRxFrames.18: 9151089
SW-MIB::swFCPortNoTxCredits.18: 2132621

Please disregard the above; they reflect an off-by-one error in my
reading of SNMP values (Brocade numbers ports from 0 in most cases, but
when viewing via SNMP, they start from 1.)

It seems check_mk is right about our FC switches really having enormous
amounts of NoTxCredits on some of the ports.

--
Regards,
Troels Arvin <troels@arvin.dk>
http://troels.arvin.dk/

Hi,

Hello again,

> Deltas:

SW-MIB::swFCPortTxFrames.18: 325038385
SW-MIB::swFCPortRxFrames.18: 9151089
SW-MIB::swFCPortNoTxCredits.18: 2132621

Please disregard the above; they reflect an off-by-one error in my
reading of SNMP values (Brocade numbers ports from 0 in most cases, but
when viewing via SNMP, they start from 1.)

It seems check_mk is right about our FC switches really having enormous
amounts of NoTxCredits on some of the ports.

Thanks for updating us, this check is rather hard to validate short of
running
a full SAN including bottlenecks - I'm glad the check was correct after
all.

Regarding your numbers - the NoTxCredits rate is under 1%, which is not
*that* bad, unless you are indeed seeing latency issues?
Is it some kind of long range link or an ISL?

Florian

···

On Mon, 16 Jan 2012 10:40:08 +0000 (UTC), Troels Arvin <troels@arvin.dk> wrote:

On Thursday, January 12, I wrote

--
Mathias Kettner GmbH | \/ | |/ / M A T H I A S K E T T N E R
Florian Heigl | |\/| | ' /
Steinstr. 44 | | | | . \ Linux Beratung & Schulung
81667 München |_| |_|_|\_\ http://mathias-kettner.de
Tel.: 089 / 1890 4210
Fax.: 089 / 1890 4211 Mail: fh@mathias-kettner.de

correction: not under - around 1%

···

On Mon, 16 Jan 2012 11:53:47 +0100, Florian Heigl <fh@mathias-kettner.de> wrote:

Regarding your numbers - the NoTxCredits rate is under 1%, which is not
*that* bad, unless you are indeed seeing latency issues?
Is it some kind of long range link or an ISL?

--
Mathias Kettner GmbH | \/ | |/ / M A T H I A S K E T T N E R
Florian Heigl | |\/| | ' /
Steinstr. 44 | | | | . \ Linux Beratung & Schulung
81667 München |_| |_|_|\_\ http://mathias-kettner.de
Tel.: 089 / 1890 4210
Fax.: 089 / 1890 4211 Mail: fh@mathias-kettner.de

Hello,

Florian Heigl wrote:

Regarding your numbers - the NoTxCredits rate is under 1%

Yes, and they were numbers about the wrong port. The problem is described
better here:

- But buttom line is that check_mk seems to be right in pointing out a
problem. (check_mk's brocade code may need better link speed detection,
though; I may return with more on that later).

···

--
Regards,
Troels Arvin <troels@arvin.dk>
http://troels.arvin.dk/