CMK version: 2.3.0p18cre
OS version: Ubuntu 22.04 / Docker
Issue:
Since we switched to Appearance of network interface: Use description
on our CheckMK instance, we get flapping CheckMK discovery issues, due to docker0
being picked up despite ignoring it in our config.
Interestingly, it is being picked up as Interface 4
for example, i.e. using the old naming scheme:
The other interfaces are as expected:
This is our Interface discovery configuration, which successfully filtered out docker0
before changing the Appearance to Use description
:
Configure discovery of single interfaces: Discover single interfaces,
Appearance of network interface: Use description
Port numbers: Do not pad
Conditions for this rule to apply: Specify matching conditions,
Match port states: 1 - up
Match interface alias (regex): (?!lo)(?!veth[\w\d@]+)(?!docker0)[\w\d@.]+
Match interface description (regex): (?!lo)(?!veth[\w\d@]+)(?!docker0)[\w\d@.]+
Since we have workloads that start and stop docker regularly, we don’t care about the interface being up (or rather, it’d be annoying if we were monitoring it).
However, almost as bad is that we now are getting continuous notifications that ChecMK_Discovery found a new interface.
Services unmonitored: 1 (lnx_if: 1)WARN, Host labels: all up to date
How can we stop this again?
The reason for switching to Use description
is that we had fluctuating interface ids, which required rediscovering the services then and when.
Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)
value store: synchronizing
Trying to acquire lock on /omd/sites/hpc/tmp/check_mk/counters/fq.dn
Got lock on /omd/sites/hpc/tmp/check_mk/counters/fq.dn
value store: loading from disk
Releasing lock on /omd/sites/hpc/tmp/check_mk/counters/fq.dn
Released lock on /omd/sites/hpc/tmp/check_mk/counters/fq.dn
Checkmk version 2.3.0p18
+ FETCHING DATA
Source: SourceInfo(hostname='fq.dn', ipaddress='123.456.7.8', ident='agent', fetcher_type=<FetcherType.TCP: 8>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7f77770914c0]
Read from cache: AgentFileCache(fq.dn, path_template=/omd/sites/hpc/tmp/check_mk/cache/{hostname}, max_age=MaxAge(checking=0, discovery=90.0, inventory=90.0), simulation=False, use_only_cache=False, file_cache_mode=6)
Not using cache (Too old. Age is 14 sec, allowed is 0 sec)
Connecting via TCP to 123.456.7.8:6556 (5.0s timeout)
Detected transport protocol: TransportProtocol.TLS
Reading data from agent via TLS socket
Reading data from agent
Detected transport protocol: TransportProtocol.PLAIN
Closing TCP connection to 123.456.7.8:6556
Write data to cache file /omd/sites/hpc/tmp/check_mk/cache/fq.dn
Trying to acquire lock on /omd/sites/hpc/tmp/check_mk/cache/fq.dn
Got lock on /omd/sites/hpc/tmp/check_mk/cache/fq.dn
Releasing lock on /omd/sites/hpc/tmp/check_mk/cache/fq.dn
Released lock on /omd/sites/hpc/tmp/check_mk/cache/fq.dn
[cpu_tracking] Stop [7f77770914c0 - Snapshot(process=posix.times_result(user=0.009999999999999787, system=0.010000000000000009, children_user=0.0, children_system=0.0, elapsed=5.469999998807907))]
Source: SourceInfo(hostname='fq.dn', ipaddress='123.456.7.8', ident='piggyback', fetcher_type=<FetcherType.PIGGYBACK: 4>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7f7776e03ce0]
Read from cache: NoCache(fq.dn, path_template=/dev/null, max_age=MaxAge(checking=0.0, discovery=0.0, inventory=0.0), simulation=False, use_only_cache=False, file_cache_mode=1)
No piggyback files for 'fq.dn'. Skip processing.
No piggyback files for '123.456.7.8'. Skip processing.
Get piggybacked data
[cpu_tracking] Stop [7f7776e03ce0 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
[cpu_tracking] Start [7f77770913d0]
+ PARSE FETCHER RESULTS
<<<check_mk>>> / Transition NOOPParser -> HostSectionParser
<<<cmk_agent_ctl_status:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<checkmk_agent_plugins_lnx:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<labels:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<df_v2>>> / Transition HostSectionParser -> HostSectionParser
<<<df_v2>>> / Transition HostSectionParser -> HostSectionParser
<<<systemd_units>>> / Transition HostSectionParser -> HostSectionParser
<<<nfsmounts_v2:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<cifsmounts>>> / Transition HostSectionParser -> HostSectionParser
<<<mounts>>> / Transition HostSectionParser -> HostSectionParser
<<<ps_lnx>>> / Transition HostSectionParser -> HostSectionParser
<<<mem>>> / Transition HostSectionParser -> HostSectionParser
<<<cpu>>> / Transition HostSectionParser -> HostSectionParser
<<<uptime>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_if>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_if:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_bonding:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<tcp_conn_stats>>> / Transition HostSectionParser -> HostSectionParser
<<<multipath>>> / Transition HostSectionParser -> HostSectionParser
<<<diskstat>>> / Transition HostSectionParser -> HostSectionParser
<<<kernel>>> / Transition HostSectionParser -> HostSectionParser
<<<md>>> / Transition HostSectionParser -> HostSectionParser
<<<vbox_guest>>> / Transition HostSectionParser -> HostSectionParser
<<<job>>> / Transition HostSectionParser -> HostSectionParser
<<<chrony:cached(1730225358,120)>>> / Transition HostSectionParser -> HostSectionParser
<<<ipmi:cached(1730225179,300):sep(124)>>> / Transition HostSectionParser -> HostSectionParser
<<<ipmi_discrete:cached(1730225118,300):sep(124)>>> / Transition HostSectionParser -> HostSectionParser
<<<local:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<nfsiostat>>> / Transition HostSectionParser -> HostSectionParser
<<<smart>>> / Transition HostSectionParser -> HostSectionParser
HostKey(hostname='fq.dn', source_type=<SourceType.HOST: 1>) -> Add sections: ['check_mk', 'checkmk_agent_plugins_lnx', 'chrony', 'cifsmounts', 'cmk_agent_ctl_status', 'cpu', 'df_v2', 'diskstat', 'ipmi', 'ipmi_discrete', 'job', 'kernel', 'labels', 'lnx_bonding', 'lnx_if', 'local', 'md', 'mem', 'mounts', 'multipath', 'nfsiostat', 'nfsmounts_v2', 'ps_lnx', 'smart', 'systemd_units', 'tcp_conn_stats', 'uptime', 'vbox_guest']
HostKey(hostname='fq.dn', source_type=<SourceType.HOST: 1>) -> Add sections: []
Received no piggyback data
Bonding Interface bond0 Status: up, Mode: IEEE 802.3ad Dynamic link aggregation, enp34s0f1/3C:EC:EF:0C:A2:91 up, enp34s0f0/3C:EC:EF:0C:A2:90 up
CPU load 15 min load: 250.25, 15 min load per core: 0.98 (256 cores)
CPU utilization Total CPU: 97.61%
Check_MK Agent Version: 2.3.0p14, OS: linux, Agent plug-ins: 2, Local checks: 0
Disk IO SUMMARY Read: 0.00 B/s, Write: 146 kB/s, Latency: 1 millisecond
Filesystem / Used: 14.08% - 28.5 GiB of 202 GiB, trend per 1 day 0 hours: -2.22 MiB, trend per 1 day 0 hours: -0.00%
Filesystem /boot Used: 31.45% - 309 MiB of 984 MiB, trend per 1 day 0 hours: -1 B, trend per 1 day 0 hours: -0.00%
Filesystem /raid Used: 6.22% - 552 GiB of 8.66 TiB, trend per 1 day 0 hours: -4.61 GiB, trend per 1 day 0 hours: -0.05%
IPMI Sensor Summary 53 sensors in total, 49 sensors ok, 1 sensors critical(!!), PS2_Status: ok (Presence detected, Failure detected, Power Supply AC lost)(!!), 3 sensors skipped
Interface bond0 [2], (up), MAC: 3C:EC:EF:0C:A2:90, Speed: 20 GBit/s, In: 1.03 kB/s (<0.01%), Out: 19.1 kB/s (<0.01%)
Interface enp34s0f0 [6], (up), MAC: 3C:EC:EF:0C:A2:90, Speed: 10 GBit/s, In: 760 B/s (<0.01%), Out: 0.00 B/s (0%)
Interface enp34s0f1 [7], (up), MAC: 3C:EC:EF:0C:A2:91, Speed: 10 GBit/s, In: 270 B/s (<0.01%), Out: 19.1 kB/s (<0.01%)
Interface enp65s0f0np0 [8], (up), MAC: B8:CE:F6:5A:A4:22, Speed: 100 GBit/s, In: 114 B/s (<0.01%), Out: 11.0 kB/s (<0.01%)
Kernel Performance Process Creations: 18.33/s, Context Switches: 2081.86/s, Major Page Faults: 0.00/s, Page Swap in: 0.00/s, Page Swap Out: 0.00/s
MD Softraid md0 Status: active, Spare: 0, Failed: 0, Active: 2, Status: 2/2, UU
MD Softraid md2 Status: active, Spare: 0, Failed: 0, Active: 2, Status: 2/2, UU
Memory Total virtual memory: 5.28% - 55.4 GiB of 1.02 TiB, 9 additional details available
Mount options of / Mount options exactly as expected
Mount options of /boot Mount options exactly as expected
Mount options of /raid Mount options exactly as expected
NFS IO stats 192.168.7.10:/GPU/Data/home Operations: 35.65/s, RPC Backlog: 0.00, Read operations: 0.65/s, Reads size: 20.0 B/s, Read bytes per operation: 30.91 B/op, Read Retransmission: 0%, Read average RTT: 504 microseconds, Read average EXE: 589 microseconds, Write operations: 4.92/s, Writes size: 1.09 kB/s, Write bytes per operation: 222.49 B/op, Write Retransmission: 0%, Write Average RTT: 2 milliseconds, Write Average EXE: 3 milliseconds
NFS IO stats 192.168.7.10:/GPU/Data/shareX Operations: 35.65/s, RPC Backlog: 0.00, Read operations: 0.65/s, Reads size: 20.0 B/s, Read bytes per operation: 30.91 B/op, Read Retransmission: 0%, Read average RTT: 504 microseconds, Read average EXE: 589 microseconds, Write operations: 4.92/s, Writes size: 1.09 kB/s, Write bytes per operation: 222.49 B/op, Write Retransmission: 0%, Write Average RTT: 2 milliseconds, Write Average EXE: 3 milliseconds
NFS IO stats 192.168.7.10:/cvmp/cvmp Operations: 35.65/s, RPC Backlog: 0.00, Read operations: 0.00/s, Reads size: 0.00 B/s, Read bytes per operation: 0.00 B/op, Read Retransmission: 0%, Read average RTT: 0 seconds, Read average EXE: 0 seconds, Write operations: 0.00/s, Writes size: 0.00 B/s, Write bytes per operation: 0.00 B/op, Write Retransmission: 0%, Write Average RTT: 0 seconds, Write Average EXE: 0 seconds
NFS mount /home Source: 192.168.7.10:/GPU/Data/home, Used: 82.51% - 105 TiB of 127 TiB (warn/crit at 80.00%/90.00% used)(!), trend per 1 day 0 hours: +103 GiB, trend per 1 day 0 hours: +0.08%, Time left until disk full: 220 days 20 hours
NFS mount /shareY Source: 192.168.7.10:/shareY/shareY, Used: 93.57% - 33.9 TiB of 36.2 TiB (warn/crit at 80.00%/90.00% used)(!!), trend per 1 day 0 hours: +19.0 GiB, trend per 1 day 0 hours: +0.05%, Time left until disk full: 125 days 22 hours
NFS mount /shareX Source: 192.168.7.10:/GPU/Data/shareX, Used: 82.51% - 105 TiB of 127 TiB (warn/crit at 80.00%/90.00% used)(!), trend per 1 day 0 hours: +103 GiB, trend per 1 day 0 hours: +0.08%, Time left until disk full: 220 days 20 hours
NTP Time Offset: 0.0000 ms, Stratum: 3, Time since last sync: 18 minutes 1 second
Number of threads 3080, Usage: 0.04%
SMART /dev/nvme0n1 Stats Powered on: 2 years 223 days, Power cycles: 26, Critical warning: 0, Media and data integrity errors: 0, Available spare: 100.00%, Percentage used: 0%, Error information log entries: 73, Data units read: 3.66 TiB, Data units written: 13.7 TiB
SMART /dev/nvme1n1 Stats Powered on: 2 years 223 days, Power cycles: 26, Critical warning: 0, Media and data integrity errors: 0, Available spare: 100.00%, Percentage used: 0%, Error information log entries: 73, Data units read: 3.81 TiB, Data units written: 17.2 TiB
SMART /dev/nvme2n1 Stats Powered on: 2 years 223 days, Power cycles: 25, Critical warning: 0, Media and data integrity errors: 0, Available spare: 100.00%, Percentage used: 0%, Error information log entries: 73, Data units read: 3.55 TiB, Data units written: 14.9 TiB
SMART /dev/nvme3n1 Stats Powered on: 2 years 164 days, Power cycles: 24, Critical warning: 0, Media and data integrity errors: 0, Available spare: 100.00%, Percentage used: 0%, Error information log entries: 57, Data units read: 3.27 TiB, Data units written: 13.2 TiB
SMART 0ATA_INTEL_SSDSC2KG24_BTYG026201DY240AGN Stats Reallocated sectors: 0, Powered on: 2 years 223 days, Power cycles: 27, End-to-End errors: 0, Uncorrectable errors: 0, Pending sectors: 0, CRC errors: 0
SMART 0ATA_INTEL_SSDSC2KG24_BTYG026304RL240AGN Stats Reallocated sectors: 0, Powered on: 2 years 223 days, Power cycles: 27, End-to-End errors: 0, Uncorrectable errors: 0, Pending sectors: 0, CRC errors: 0
Systemd Service Summary Total: 199, Disabled: 6, Failed: 0
Systemd Socket Summary Total: 24, Disabled: 0, Failed: 0
TCP Connections Established: 14
Temperature SMART /dev/nvme0n1 32 °C
Temperature SMART /dev/nvme1n1 33 °C
Temperature SMART /dev/nvme2n1 33 °C
Temperature SMART /dev/nvme3n1 32 °C
Temperature SMART 0ATA_INTEL_SSDSC2KG24_BTYG026201DY240AGN 21 °C
Temperature SMART 0ATA_INTEL_SSDSC2KG24_BTYG026304RL240AGN 23 °C
Uptime Up since 2024-10-07 22:52:49, Uptime: 21 days 21 hours
No piggyback files for 'fq.dn'. Skip processing.
No piggyback files for '123.456.7.8'. Skip processing.
[cpu_tracking] Stop [7f77770913d0 - Snapshot(process=posix.times_result(user=0.06999999999999984, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.5))]
[agent] Success, [piggyback] Success (but no data found for this host), execution time 6.0 sec | execution_time=5.970 user_time=0.080 system_time=0.010 children_user_time=0.000 children_system_time=0.000 cmk_time_agent=5.450