Best Practice - Distributed Monitoring - Timeouts - SNMP - stale

Moin Andreas,

vielen Dank schon mal für deinen Input. Ich hänge hier mal den Output von cmk --debug -vvn ran, habe ich auf 2 Devices laufen lassen, sehe dort keine gravierenden Unterschiede im Output. Was mich ebenfalls wundert ist, dass es so aussieht, als würden die Services auf den besagten SNMP-Devices gar nicht mehr angefragt werden, sobald diese auf stale sind. Last service check time ist hier 18 h, oder interpretiere ich das falsch?

Output CMK --debug

### SNMP-Device das ständig stale service hat ###

cmk --debug -vvn xxx.20.52.xx

[cpu_tracking] Start with phase ‘busy’
Check_MK version 1.6.0p9
Try aquire lock on /omd/sites/cmk/tmp/check_mk/counters/xxx.20.52.xx
Got lock on /omd/sites/cmk/tmp/check_mk/counters/xxx.20.52.xx
Releasing lock on /omd/sites/cmk/tmp/check_mk/counters/xxx.20.52.xx
Released lock on /omd/sites/cmk/tmp/check_mk/counters/xxx.20.52.xx

  • FETCHING DATA
    [cpu_tracking] Push phase ‘snmp’ (Stack: [‘busy’])
    [snmp] No persisted sections loaded
    [snmp] Not using cache (Does not exist)
    [snmp] Execute data source
    [snmp] Write data to cache file /omd/sites/cmk/tmp/check_mk/data_source_cache/snmp/xxx.20.52.xx
    Try aquire lock on /omd/sites/cmk/tmp/check_mk/data_source_cache/snmp/xxx.20.52.xx
    Got lock on /omd/sites/cmk/tmp/check_mk/data_source_cache/snmp/xxx.20.52.xx
    Releasing lock on /omd/sites/cmk/tmp/check_mk/data_source_cache/snmp/xxx.20.52.xx
    Released lock on /omd/sites/cmk/tmp/check_mk/data_source_cache/snmp/xxx.20.52.xx
    [cpu_tracking] Pop phase ‘snmp’ (Stack: [‘busy’, ‘snmp’])
    [cpu_tracking] Push phase ‘agent’ (Stack: [‘busy’])
    [piggyback] No persisted sections loaded
    [piggyback] Execute data source
    No piggyback files for ‘xxx.20.52.xx’. Skip processing.
    No piggyback files for ‘xxx.20.52.xx’. Skip processing.
    [cpu_tracking] Pop phase ‘agent’ (Stack: [‘busy’, ‘agent’])
    [cpu_tracking] End
    OK - [snmp] Success, execution time 0.0 sec | execution_time=0.015 user_time=0.010 system_time=0.000 children_user_time=0.000 children_system_time=0.000 cmk_time_snmp=0.002 cmk_time_agent=0.001

### SNMP-Device das funktioniert ###

cmk --debug -vvn xxx.20.12.xx

[cpu_tracking] Start with phase ‘busy’
Check_MK version 1.6.0p9
Try aquire lock on /omd/sites/cmk/tmp/check_mk/counters/xxx.20.12.xx
Got lock on /omd/sites/cmk/tmp/check_mk/counters/xxx.20.12.xx
Releasing lock on /omd/sites/cmk/tmp/check_mk/counters/xxx.20.12.xx
Released lock on /omd/sites/cmk/tmp/check_mk/counters/xxx.20.12.xx

  • FETCHING DATA
    [cpu_tracking] Push phase ‘snmp’ (Stack: [‘busy’])
    [snmp] No persisted sections loaded
    [snmp] Not using cache (Does not exist)
    [snmp] Execute data source
    [snmp] Write data to cache file /omd/sites/cmk/tmp/check_mk/data_source_cache/snmp/xxx.20.12.xx
    Try aquire lock on /omd/sites/cmk/tmp/check_mk/data_source_cache/snmp/xxx.20.12.xx
    Got lock on /omd/sites/cmk/tmp/check_mk/data_source_cache/snmp/xxx.20.12.xx
    Releasing lock on /omd/sites/cmk/tmp/check_mk/data_source_cache/snmp/xxx.20.12.xx
    Released lock on /omd/sites/cmk/tmp/check_mk/data_source_cache/snmp/xxx.20.12.xx
    [cpu_tracking] Pop phase ‘snmp’ (Stack: [‘busy’, ‘snmp’])
    [cpu_tracking] Push phase ‘agent’ (Stack: [‘busy’])
    [piggyback] No persisted sections loaded
    [piggyback] Execute data source
    No piggyback files for ‘xxx.20.12.xx’. Skip processing.
    No piggyback files for ‘xxx.20.12.xx’. Skip processing.
    [cpu_tracking] Pop phase ‘agent’ (Stack: [‘busy’, ‘agent’])
    [cpu_tracking] End
    OK - [snmp] Success, execution time 0.0 sec | execution_time=0.018 user_time=0.010 system_time=0.000 children_user_time=0.000 children_system_time=0.000 cmk_time_snmp=0.004 cmk_time_agent=0.002

Es macht keinen Unterschied, ob auf einer VM oder Container. Ich habe andere Standorte, absolut identische Config (Docker) wo das reibungslos funktioniert.