2.1.0p8 snmpv3 authNoPriv - Cannot fetch system description

PhilippL · July 22, 2022, 8:21am

CMK version: 2.1.0p8.cee
OS version: Check_MK virt1 v1.5.4

Error message:
Check_MK Discovery
Cannot fetch system description OID .1.3.6.1.2.1.1.1.0. Please check your SNMP configuration. Possible reason might be: Wrong credentials

Description:
SNMP Checks with snmpv3 authNoPriv fail on 2.1.0p8.cee.

Steps to isolate / reproduce the issue:

Create two clean Checkmk instances with the versions 2.0.0p22.cee & 2.1.0p8.cee
Add only 1 host to each site with exactly the same configuration (Fujitsu IRMC Device)
Do a discovery via GUI or CMD
Use a cmk -D to compare the used SNMP params directly

Type of agent 2.0.0p22.cee & 2.1.0p8.cee compared

SNMP (Credentials: 'authNoPriv, md5, checkmk, strongpw!, Bulk walk: yes, Port: 161, Backend: Inline)
SNMP (Credentials: 'authNoPriv, md5, checkmk, strongpw!, Bulk walk: yes, Port: 161, Backend: Inline)

2.0.0p22.cee → works like a charm
2.1.0p8.cee → Discovery / SNMP-Checks fail

Full output

OMD[test]:~/etc$ omd version
OMD - Open Monitoring Distribution Version 2.1.0p8.cee
OMD[test]:~/etc$ cmk -D esx05-irmc

esx05-irmc
Addresses:              192.168.1.105
Tags:                   [address_family:ip-v4-only], [agent:special-agents], [criticality:prod], [ip-v4:ip-v4], [networking:lan], [piggyback:auto-piggyback], [site:test], [snmp:snmp], [snmp_ds:snmp-v2], [tcp:tcp]
Labels:                 [cmk/site:test]
Host groups:            check_mk
Contact groups:         all
Agent mode:             No Checkmk agent, all configured special agents
Type of agent:
  SNMP (Credentials: 'authNoPriv, md5, checkmk, strongpw!, Bulk walk: yes, Port: 161, Backend: Inline)
  Process piggyback data from /omd/sites/test/tmp/check_mk/piggyback/esx05-irmc
Services:
  checktype item params description groups
  --------- ---- ------ ----------- ------

OMD[test]:~$ omd version
OMD - Open Monitoring Distribution Version 2.0.0p22.cee
OMD[test]:~$ cmk -D esx05-irmc

esx05-irmc
Addresses:              192.168.1.105
Tags:                   [address_family:ip-v4-only], [agent:cmk-agent], [criticality:prod], [ip-v4:ip-v4], [networking:lan], [piggyback:auto-piggyback], [site:test], [snmp:snmp], [snmp_ds:snmp-v2], [tcp:tcp]
Labels:
Host groups:            check_mk
Contact groups:         all
Agent mode:             No Checkmk agent, all configured special agents
Type of agent:
  SNMP (Credentials: 'authNoPriv, md5, checkmk, strongpw!, Bulk walk: yes, Port: 161, Backend: Inline)
  Process piggyback data from /omd/sites/test/tmp/check_mk/piggyback/esx05-irmc

Inventory Debug output

OMD[test]:~/etc$ cmk --debug -IIvvv esx05-irmc
#Discovering services and host labels on: esx05-irmc
esx05-irmc:
+ FETCHING DATA
  Source: SourceType.HOST/FetcherType.SNMP
[cpu_tracking] Start [7fe520f53940]
[SNMPFetcher] Fetch with cache settings: SNMPFileCache(esx05-irmc, base_path=/omd/sites/test/tmp/check_mk/data_source_cache/snmp, max_age=MaxAge(checking=0, discovery=120, inventory=120), disabled=False, use_outdated=False, simulation=False)
Not using cache (Too old. Age is 1550 sec, allowed is 120 sec)
[SNMPFetcher] Execute data source
  SNMP scan:
       Getting OID .1.3.6.1.2.1.1.1.0: Executing SNMP GET of .1.3.6.1.2.1.1.1.0 on esx05-irmc
=> [None] NOSUCHINSTANCE
failed.
[cpu_tracking] Stop [7fe520f53940 - Snapshot(process=posix.times_result(user=0.020000000000000018, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.029999997466802597))]
  Source: SourceType.HOST/FetcherType.PIGGYBACK
[cpu_tracking] Start [7fe520e4c040]
[PiggybackFetcher] Fetch with cache settings: NoCache(esx05-irmc, base_path=/omd/sites/test/tmp/check_mk/data_source_cache/piggyback, max_age=MaxAge(checking=0, discovery=120, inventory=120), disabled=True, use_outdated=False, simulation=False)
Not using cache (Cache usage disabled)
[PiggybackFetcher] Execute data source
No piggyback files for 'esx05-irmc'. Skip processing.
No piggyback files for '192.168.1.105'. Skip processing.
Not using cache (Cache usage disabled)
[cpu_tracking] Stop [7fe520e4c040 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
+ PARSE FETCHER RESULTS
  Source: SourceType.HOST/FetcherType.SNMP
  -> Not adding sections: Cannot fetch system description OID .1.3.6.1.2.1.1.1.0. Please check your SNMP configuration. Possible reason might be: Wrong credentials, wrong SNMP version, Firewall rules, etc.
  Source: SourceType.HOST/FetcherType.PIGGYBACK
No persisted sections
  -> Add sections: []
Received no piggyback data
Received no piggyback data
+ ANALYSE DISCOVERED HOST LABELS
Trying host label discovery with:
Trying host label discovery with:
SUCCESS - Found no host labels
+ ANALYSE DISCOVERED SERVICES
+ EXECUTING DISCOVERY PLUGINS (0)
  Trying discovery with:
SUCCESS - Found no services

Has somebody a similar behaviour registered at 2.1.x Versions?

Best Regards
Philipp

andreas-doehler · July 22, 2022, 3:27pm

As a first check i would switch the SNMP backend from inline to the old legacy one for this device.
It is very possible that the actual Python SNMP lib don’t want to speak to your very old SNMP implementation on the iRMC.
Inline SNMP backend problems are my most SNMP problems

PhilippL · July 25, 2022, 10:52am

Hi Andreas,
That was also my first thought, but unfortunately the switch to Backend: Classic doesn’t helps. What I could also find out is, that with classic backend also thrown an exception in Version 2.0.0p22.cee.
With reset to factory defaults “inline SNMP” the check works again in 2.0.0p22.cee.

I assume CMK 2.1 handles the check_legacy_includes differently than 2.0.0p22.cee, I’ll investigate, maybe I can find out more.

andreas-doehler · July 25, 2022, 11:40am

Der snmpwalk/snmpget Befehl welcher bei der Umstellung auf Classic Backend sichtbar ist, ist dieser manuell ausführbar? Hatte hier auch schon Befehle welche nicht funktioniert haben wegen irgendeiner der verwendeten Optionen.

PhilippL · July 26, 2022, 8:21am

Snmpget schlägt fehl, weil hinter der OID kein value ist, sondern noch eine Ebene tiefer verzweigt:

[SNMPFetcher] Execute data source
  SNMP scan:
       Getting OID .1.3.6.1.2.1.1.1.0: Running 'snmpget -v3 -l authNoPriv -a md5 -u checkmk -A StrongPW! -m "" -M "" -On -OQ -Oe -Ot x.x.x.105 .1.3.6.1.2.1.1.1.0'
SNMP answer: ==> [No Such Instance currently exists at this OID]

snmpwalk funktioniert und snmpget auch, wenn direkt auf die OID abgefragt wird (Null anhängen).

snmpget -v3 -l authNoPriv -a md5 -u checkmk -A StrongPW! -m "" -M "" -On -OQ -Oe -Ot x.x.x.105 .1.3.6.1.2.1.1.1.0.0
.1.3.6.1.2.1.1.1.0.0 = "Primergy iRMC S5 Feb 24 2022 12:44:50 JST"

Was mich nun nur wundert ist, warum funktioniert der selbe Check noch mit der 2.0.0p22.cee (Inline-SNMP). Ich vermute, dort wird die fehlende “.0” bei der OID irgendwo im Code abgefangen und einfach tiefer in den den Baum gesprungen. Dann könnte dieses Verhalten auch bei anderen SNMP Checks zum tragen kommen.

Schön wäre es, wenn eine Möglichkeit besteht bei der alten “getesteten” Check-Codebase zu bleiben und nicht gleich alles auf die neue Check-API migrieren zu müssen. Ist natürlich auch eine Lösung, aber auch zeitlich nicht immer zu leisten und ggf. schwierig zu testen, wenn ein Check mehrere Devicetypen abhandelt.

PhilippL · July 26, 2022, 9:57am

Ich kann nun bestätigten, es liegt definitiv an der Behandlung der 0 in der ersten OID.

Zum testen habe ich einen Walk auf der 2.0.0p22.cee erstellt und auf die 2.1.0p8.cee kopiert.
Gleiches Verhalten mit den cached Daten von 2.0.0p22.cee, kein Discovery möglich.

Wrote fetched data to /omd/sites/test/var/check_mk/snmpwalks/wmesx05-irmc.
OMD[test]:~/tmp/check_mk$ cat /omd/sites/test/var/check_mk/snmpwalks/wmesx05-irmc
.1.3.6.1.2.1.1.1.0.0 Primergy iRMC S5 Feb 24 2022 12:44:50 JST
.1.3.6.1.2.1.1.2.0 .1.3.6.1.4.1.231.1.28.1

Nach dem manuellen entfernen der 0 aus dem cached Walk funktioniert Discovery und Checks in der 2.1.0p8.cee sofort.

Das ist natürlich keine Lösung, aber zumindest ist jetzt klar aus welcher Ecke das Problem kommt.

andreas-doehler · July 26, 2022, 10:12am

Das ist richtig “nett”. Die zweite 0 ist eigentlich dort “verboten”.
Aber wie immer halten sich scheinbar nicht alle Hersteller selbst an die ältesten SNMP Standards.

Schön ist vor allem, dass danach die nächsten OIDs wieder richtig erscheinen.

andreas-doehler · August 9, 2022, 4:40pm

@PhilippL gute Nachricht für eine Workaround welcher auch eigentlich nicht “hässlich” ist.
Sollte ich als PR mal schicken für eine Vereinfachung.
Hatte heute das gleiche Problem mit ca. 10 IRMC’s.
Update 2.1 und alle IRMC’s tot im Monitoring.

Lösung - alle fsc_sc2_* Checks nach Local kopiert und die Zeile

    "snmp_scan_function": is_fsc_sc2,

durch

    "snmp_scan_function": lambda oid: oid(".1.3.6.1.4.1.231.2.10.2.2.10.1.1.0"),

ersetzt.

Warum halte ich die Variante nun sogar für besser wie das bisherige “is_fsc_sc2”
Das bisherige Discovery hat folgendes gemacht.

    (
    oid(".1.3.6.1.2.1.1.2.0").startswith(".1.3.6.1.4.1.231")
    or oid(".1.3.6.1.2.1.1.2.0").startswith(".1.3.6.1.4.1.311")
    or oid(".1.3.6.1.2.1.1.2.0").startswith(".1.3.6.1.4.1.8072")
    ) and 
    oid(".1.3.6.1.4.1.231.2.10.2.2.10.1.1.0")

Damit macht meine Lösung sogar weniger Discovery Aufwand wie die aktuelle.

PhilippL · August 9, 2022, 9:20pm

Servus Andreas,
Danke für dein Feedback, klingt nach einem sehr guten Workaround, bzw. alternativen Lösung.
Habe es eben schnell mal auf der 2.1 Instanz getestet, leider bisher ohne Erfolg. Lokal Changes greifen scheinbar nicht, da immer noch versucht wird die alte OID zu callen:

OMD[main]:~/local/share/check_mk/checks$ ls -l
total 32
-rw-r----- 1 main main 1793 Aug  9 22:48 fsc_sc2_cpu_status
-rw-r----- 1 main main 2364 Aug  9 22:38 fsc_sc2_fans
-rw-r----- 1 main main 1743 Aug  9 22:38 fsc_sc2_info
-rw-r----- 1 main main 1746 Aug  9 22:38 fsc_sc2_mem_status
-rw-r----- 1 main main 2271 Aug  9 22:38 fsc_sc2_power_consumption
-rw-r----- 1 main main 2912 Aug  9 22:38 fsc_sc2_psu
-rw-r----- 1 main main 2354 Aug  9 22:38 fsc_sc2_temp
-rw-r----- 1 main main 2346 Aug  9 22:38 fsc_sc2_voltage

OMD[main]:~/local/share/check_mk/checks$ grep snmp_scan_function fsc_sc2_*
fsc_sc2_cpu_status:    "snmp_scan_function": lambda oid: oid(".1.3.6.1.2.1.1.1.0.0"),
fsc_sc2_fans:    "snmp_scan_function": lambda oid: oid(".1.3.6.1.4.1.231.2.10.2.2.10.1.1.0"),
fsc_sc2_info:    "snmp_scan_function": lambda oid: oid(".1.3.6.1.4.1.231.2.10.2.2.10.1.1.0"),
fsc_sc2_mem_status:    "snmp_scan_function": lambda oid: oid(".1.3.6.1.4.1.231.2.10.2.2.10.1.1.0"),
fsc_sc2_power_consumption:    "snmp_scan_function": lambda oid: oid(".1.3.6.1.4.1.231.2.10.2.2.10.1.1.0"),
fsc_sc2_psu:    "snmp_scan_function": lambda oid: oid(".1.3.6.1.4.1.231.2.10.2.2.10.1.1.0"),
fsc_sc2_temp:    "snmp_scan_function": lambda oid: oid(".1.3.6.1.4.1.231.2.10.2.2.10.1.1.0"),
fsc_sc2_voltage:    "snmp_scan_function": lambda oid: oid(".1.3.6.1.4.1.231.2.10.2.2.10.1.1.0"),

mit Inline SNMP

-> Not adding sections: Cannot fetch system description OID .1.3.6.1.2.1.1.1.0. Please check your SNMP configuration.

mit Classic SNMP Backend

[SNMPFetcher] Execute data source
  SNMP scan:
       Getting OID .1.3.6.1.2.1.1.1.0: Running 'snmpget -v3 -l authNoPriv -a md5 -u checkmk -A PWXY  -m "" -M "" -t 5.00 -r 3 -On -OQ -Oe -Ot x.x.x.x .1.3.6.1.2.1.1.1.0'

Deine neue SCAN OID passt aber auf jeden Fall:

.1.3.6.1.4.1.231.2.10.2.2.10.1.1.0 = "ServerView ServerControl 2 hardware monitoring agent"

ich versuche mir das Morgen mal in Ruhe anzuschauen, vielleicht einfach zu spät

andreas-doehler · August 10, 2022, 5:13am

Ah vergessen zu sagen die Hosts müssen als “Host without System description OID” konfiguriert werden.
Da Sie diese ja “nicht haben”

PhilippL · August 10, 2022, 7:32am

Danke für den Hinweis, das war ja unser Grundsatzproblem
Abschließend ist aber anzumerken, dass das Verhalten, durchaus auch noch weitere Devices / Checks betreffen kann, wenn die System description OID diese “0” zu viel enthält und damit in der 2.1 scheinbar anders behandelt wird.

andreas-doehler · August 10, 2022, 7:56am

Da hast recht. Hab das bisher aber auch nur bei Fujitsu Systemen wie deinen IRMC’s gesehen.

Norm · August 19, 2022, 3:00pm

Ich habe auch das Problem. Ich habe mal ein Ticket bei Fujitsu dazu eröffnet. Mal warten ob und wann etwas dazu kommt.

andreas-doehler · August 19, 2022, 3:05pm

Gibt schon eine aktuelle Firmware welche das Problem behebt.

Norm · August 19, 2022, 3:19pm

Gut zu wissen ist jetzt schon ein paar Tage her gewesen.

Norm · August 22, 2022, 11:29am

Hier einmal die offizielle Antwort von Fujitsu dazu:

The iRMC firmware 3.37P and 3.39P contain the following errors in SNMP response data: After updating the iRMC firmware to 3.37P or 3.39P, the system can no longer be queried correctly by the monitoring system via SNMP. The cause is a SNMP OID “.1.3.6.1.2.1.1.0”. With the new firmware suddenly an OID “.1.3.6.1.2.1.1.0.0” is used. (So with an additional “.0” at the end). Output with iRMC firmware before 3.37P. .1.3.6.1.2.1.1.1.0 = “Linux CESVMHost1-iRMC 3.14.17-ami #1 SMP PREEMPT Wed Jul 8 17:12:15 IST 2020 armv7l” Output with iRMC 3.37P and 3.39P .1.3.6.1.2.1.1.1.0.0 = “Primergy iRMC S5 Mar 4 2022 16:42:33 JST”
The updated iRMC firmware version 3.37P or 3.39P is affected by this issue. This issue has been fixed in 3.42P and will be released at the end of Sep.2022. Wait for the 3.42 P release or revert to a firmware version earlier than 3.37 P."

system · August 22, 2023, 11:30am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.