BGP peer monitoring for multiple VRFs

**CMK version:**2.0.0p15

Is anyone monitoring BGP peers that live across multiple VRFs on Arista switches?

We’ve tried using the native Arista BGP plugin and we’ve tried using this plugin as well Checkmk Exchange

With the native BGP plugin, so far we aren’t able to monitor state changes it seems. Looking at the code, it looks like that state changes aren’t baked into it. Is there somewhere we need to create a ruleset to map those state changes?

But also we’re only seeing the default VRF peers by default. Under the [SNMPv3 contexts to use in requests] ruleset, we can specify a VRF for the plugin and then just that specific VRFs BGP peers will show up. But if we tried to include all VRFs in the rule it doesn’t work, and if we have multiple rules with 1 VRF per rule, the service discovery crashes.

We tried the same steps with this plugin as well Checkmk Exchange
That plugin at least maps the state changes within the code so that’s useful, but still only able to monitor 1 VRF.

Tried copying that file and rename some variables to essentially have a plugin for each VRF, but that’s not working exactly. Seems like the plugins are fighting each other. If we specify the context for 1 VRF and then new file to specify a different VRF, both plugins are polling the same VRF even though the rules specify different ones. Not sure exactly why that’s happening yet.

I feel like we’re making this harder then it should be, wondering if there’s something easier that I’m missing…

The BGP Plugin from the Exchange (btw. you will find the latest version of this plugin here BGP peer on my Gitlab) works only with the standard BGP4-MIB. This MIB has no knowledge of VRFs or address families. So in this case you need to work with contexts :-(. Judging by the contents of the original check the Arista MIB also has no VRF information. And the address family information in the MIB is not used by the original check → means context to.

Oh cool thanks for that!

With your plugin, if I specify which VRF via an [SNMPv3 contexts to use in requests] rule, those VRF peers will be pulled. So being able to pick up the non default VRF peers is possible.

I copied the code from your plugin, renamed some variables to be different so the original in theory would pick the default VRF peers, and the updated version would pick up the custom VRF via the context rule but that’s not working. If I have just have a rule to specify the custom VRF with the new plugin, it still just picks up the default VRF for both plugins. And then can only pull the custom VRF peers if I have a rule for both plugins specifying the same custom VRF. So seems like I’m not doing something right to make that work.

Does that seem like the good method to keep pursuing, having a separate plugin for each VRF and just figure out how the context rules to stop conflicting with each other?

There is no need to create a plugin for each context. Here you will find the CMK doc on how to use snmp contexts.

Sorry that was what I was trying to explain.

While using 1 plugin:

If you don’t specify a VRF in the [SNMPv3 contexts to use in requests] ruleset, it will pull the default VRF

If you specify all VRFs under 1 rule, it will still only pull the default VRF

Having multiple rules that specify 1 VRF for each rule, will only pull the VRF that’s highest in the ruleset

I was now playing a little with the snmp contexts myself. I have created two VRFs (VRF1 and VRF2), added 3 BGP peers (one in each VRF and one in the default VRF), added a context to each VRF (SNMP_VRF1 and SNMP_VRF2). On the CLI i could verify i got (only) the data from the associated VRF.

OMD[build]:~$ snmpwalk -v3 -u checkmk -l authPriv -a SHA -A secret -x AES -X secret ro01 .1.3.6.1.2.1.15.3.1.7 -n SNMP_VRF1
SNMPv2-SMI::mib-2.15.3.1.7.192.168.10.138 = IpAddress: 192.168.10.138
OMD[build]:~$ snmpwalk -v3 -u checkmk -l authPriv -a SHA -A secret -x AES -X secret ro01 .1.3.6.1.2.1.15.3.1.7 -n SNMP_VRF2
SNMPv2-SMI::mib-2.15.3.1.7.192.168.10.141 = IpAddress: 192.168.10.141

and without a snmp context i got all three peers (this is an issue with my device type)

~$ snmpwalk -v3 -u checkmk -l authPriv -a SHA -A secret -x AES -X secret ro01 .1.3.6.1.2.1.15.3.1.7
SNMPv2-SMI::mib-2.15.3.1.7.192.168.10.127 = IpAddress: 192.168.10.127
SNMPv2-SMI::mib-2.15.3.1.7.192.168.10.138 = IpAddress: 192.168.10.138
SNMPv2-SMI::mib-2.15.3.1.7.192.168.10.141 = IpAddress: 192.168.10.141

Than i created a rule for the bgp_peer section to use the context SNMP_VRF1 and SNMP_VRF2.
image
After rediscovery of my device, CMK found the two peers from the VRFs


Whats missing is the default/global VRF. So basically CMK is working as expected.

When you say you added a context to each VRF, is that just a config on the VRF?

Yes, this is a piece of the configuration that attaches a “label” to the VRF so you can identify it in the SNMPv3 request.

Which vendor switch did you test this on?

We’re using Arista L3 switches. If I specify the name of one of the VRFs in the context rule it will pull those BGP peers. But if I have multiple VRF names in the context rule, it will only pull the the first VRF listed in the rule

I used a Cisco device. Might be on Arista the VRF name will serve as well as context name.

What OS version do your devices are running on (just curios)?
In the Arista SNMP config guide (search for context) i found a context option in the SNMP group configuration, but no hint on how to use it :frowning:

Strange, I have two VRFs and in the rule two contexts and CMK fetches both VRFs. I have used a slightly newer CMK version (2.0.0p29) than yours.

Could you verify on the CLI that the snmpwalk delivers the correct data if you use the VRF names as the context option (-n <your_context/vrf_name>), see my sample walks above please?

We have some on different versions but one of them is on 4.27.7.1M

We’re actually on p29 in our dev instance and that has the same issue for us

Yeah if I run that command via CLI and specify the VRF names, it’s pulling the correct BGP peers

can you check on the cli if

cmk --no-cache --plugins bgp_peer -vI <your_host_as_in_cmk>

drops any error like

ERROR: SNMP error 0/-17 (Bad context specified)

If there is no error can you check with

cmk --no-cache --plugins bgp_peer -nnvvvI <your_host_as_in_cmk> | grep '".1.3.6.1.2.1.15.3.1.3" on '

how often CMK tries to fetch the BGP peer data? There should by one try per configured context. In my case i have 4 context configured

cmk --no-cache --plugins bgp_peer -nnvvvI ro01 | grep '".1.3.6.1.2.1.15.3.1.3" on '
Executing BULKWALK of ".1.3.6.1.2.1.15.3.1.3" on ro01
Executing BULKWALK of ".1.3.6.1.2.1.15.3.1.3" on ro01
Executing BULKWALK of ".1.3.6.1.2.1.15.3.1.3" on ro01
Executing BULKWALK of ".1.3.6.1.2.1.15.3.1.3" on ro01

Only a small hint to see a little bit more on the command line.
Switch the SNMP backend of the tested device to “Classic” then you see the complete snmpwalk command.

Getting this error that it’s not matching to a host but that host is live though

:~$ cmk --no-cache --plugins bgp_peer -vI es01
Hostname or tag specification ‘es01’ does not match any host.

Tried this just to see and doesn’t seem like anything is happening. New line starts and a > appears on the line but it’s just sitting there

your bgp device is under this exact name configured in your CMK instance?

I removed part of the hostname from the output, but yes I used the exact name it’s configured under

I think you should double check this as the error message indicates otherwise

Yeah, I’m definitely using the correct hostname. Tried a few different ones as well but it’s not liking it. I’m able to run that snmpwalk command to specify the VRF context against the same hostname. Just not working with cmk --no-cache --plugins bgp_peer -vI

OK. Just to be clear, with the cmk command it’s not the hostname. Be that as it may, I’m out of ideas for the time being and therefore out here :frowning:

Thank you for your help at least.

Would you happen to be able to spin up a virtual Arista switch, set up custom VRFs, and see if you’re still able to specify the context and pull them in? Just want to see if that is just an issue that I’m having or is there a bigger issue with pulling multiple contexts on Aristas.

I’m leaning towards having multiple plugins for each VRF will be our best route. We’re able to specify the context so we know that works on Aristas individually. Just not able to specify multiple VRFs under 1 plugin.

If you specify default in the context rule, do you also get BGP peers in the default VRF as well? We have BGP peers in the default and in custom VRFs, so we would need to be able to pull default as well. If we have multiple plugins per VRF that’ll be moot. Just curious though.