Windows Agents Intermittently Failing

CMK version: 2.0.0p23 CEE
OS version: Ubuntu 20.04

Error message: [agent] MKTimeout(‘Fetcher for host “—snip—” timed out after 60 seconds’)

Output of “cmk --debug -vvn hostname”:

OMD[---site---]:~$ cmk --debug -vvn ---host---
Checkmk version 2.0.0p23
Try license usage history update.
Trying to acquire lock on /omd/sites/---site---/var/check_mk/license_usage/next_run
Got lock on /omd/sites/---site---/var/check_mk/license_usage/next_run
Trying to acquire lock on /omd/sites/---site---/var/check_mk/license_usage/history.json
Got lock on /omd/sites/---site---/var/check_mk/license_usage/history.json
Next run time has not been reached yet. Abort.
Releasing lock on /omd/sites/---site---/var/check_mk/license_usage/history.json
Released lock on /omd/sites/---site---/var/check_mk/license_usage/history.json
Releasing lock on /omd/sites/---site---/var/check_mk/license_usage/next_run
Released lock on /omd/sites/---site---/var/check_mk/license_usage/next_run
Loading autochecks from /omd/sites/---site---/var/check_mk/autochecks/---host---.mk
+ FETCHING DATA
  Source: SourceType.HOST/FetcherType.TCP
[cpu_tracking] Start [7fc15566ddc0]
[TCPFetcher] Fetch with cache settings: DefaultAgentFileCache(base_path=PosixPath('/omd/sites/---site---/tmp/check_mk/cache/---host---'), max_age=MaxAge(checking=0, discovery=120, inventory=120), disabled=False, use_outdated=False, simulation=False)
Not using cache (Too old. Age is 493 sec, allowed is 0 sec)
[TCPFetcher] Execute data source
Connecting via TCP to ---host---:6556 (15.0s timeout)
Reading data from agent
Output is not encrypted
Write data to cache file /omd/sites/---site---/tmp/check_mk/cache/---host---
Trying to acquire lock on /omd/sites/---site---/tmp/check_mk/cache/---host---
Got lock on /omd/sites/---site---/tmp/check_mk/cache/---host---
Releasing lock on /omd/sites/---site---/tmp/check_mk/cache/---host---
Released lock on /omd/sites/---site---/tmp/check_mk/cache/---host---
Closing TCP connection to ---host---:6556
[cpu_tracking] Stop [7fc15566ddc0 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=118.92999999970198))]
  Source: SourceType.HOST/FetcherType.PIGGYBACK
[cpu_tracking] Start [7fc155568670]
[PiggybackFetcher] Fetch with cache settings: NoCache(base_path=PosixPath('/omd/sites/---site---/tmp/check_mk/data_source_cache/piggyback/---host---'), max_age=MaxAge(checking=0, discovery=120, inventory=120), disabled=False, use_outdated=False, simulation=False)
[PiggybackFetcher] Execute data source
Piggyback file '/omd/sites/---site---/tmp/check_mk/piggyback/---host---/---piggy---': ---piggy---
Piggyback file '/omd/sites/---site---/tmp/check_mk/piggyback/---host---/---piggy---': ---piggy---
[cpu_tracking] Stop [7fc155568670 - Snapshot(process=posix.times_result(user=0.010000000000000009, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
[cpu_tracking] Start [7fc15566ddf0]
+ PARSE FETCHER RESULTS
  Source: SourceType.HOST/FetcherType.TCP
Trying to acquire lock on /omd/sites/---site---/var/check_mk/persisted/---host---
Got lock on /omd/sites/---site---/var/check_mk/persisted/---host---
Releasing lock on /omd/sites/---site---/var/check_mk/persisted/---host---
Released lock on /omd/sites/---site---/var/check_mk/persisted/---host---
Stored persisted sections: win_cpuinfo, win_os, win_bios, win_system, win_computersystem, win_disks, win_video, win_networkadapter, win_ip_r, win_wmi_software, win_wmi_updates, win_reg_uninstall, win_exefiles
Using persisted section SectionName('win_cpuinfo')
Using persisted section SectionName('win_os')
Using persisted section SectionName('win_bios')
Using persisted section SectionName('win_system')
Using persisted section SectionName('win_computersystem')
Using persisted section SectionName('win_disks')
Using persisted section SectionName('win_video')
Using persisted section SectionName('win_networkadapter')
Using persisted section SectionName('win_ip_r')
Using persisted section SectionName('win_wmi_software')
Using persisted section SectionName('win_wmi_updates')
Using persisted section SectionName('win_reg_uninstall')
Using persisted section SectionName('win_exefiles')
  -> Add sections: ['check_mk', 'df', 'dotnet_clrmemory', 'fileinfo', 'logwatch', 'mem', 'mssql_backup', 'mssql_blocked_sessions', 'mssql_cluster', 'mssql_connections', 'mssql_counters', 'mssql_databases', 'mssql_datafiles', 'mssql_instance', 'mssql_jobs', 'mssql_tablespaces', 'mssql_transactionlogs', 'mssql_versions', 'ps', 'services', 'systemtime', 'uptime', 'win_bios', 'win_computersystem', 'win_cpuinfo', 'win_disks', 'win_exefiles', 'win_ip_r', 'win_license', 'win_networkadapter', 'win_os', 'win_reg_uninstall', 'win_system', 'win_video', 'win_wmi_software', 'win_wmi_updates', 'windows_tasks', 'windows_updates', 'winperf_if', 'winperf_phydisk', 'winperf_processor', 'wmi_cpuload']
  Source: SourceType.HOST/FetcherType.PIGGYBACK
No persisted sections loaded
  -> Add sections: ['esx_vsphere_vm', 'labels']
Received no piggyback data
Loading item states
Trying to acquire lock on /omd/sites/---site---/tmp/check_mk/counters/---host---
Got lock on /omd/sites/---site---/tmp/check_mk/counters/---host---
Releasing lock on /omd/sites/---site---/tmp/check_mk/counters/---host---
Released lock on /omd/sites/---site---/tmp/check_mk/counters/---host---
CPU utilization      Total CPU (15 min average): 53.93%, Core core0: 99.07% (warn/crit at 97.00%/99.50%)(!)
Check_MK Agent       No errors, Last update check: 2022-06-28 17:36:03, Last agent update: 2022-06-24 21:09:48, Update URL: https://---checkmk---/---site---/check_mk, Agent configuration: 25ee7fa2
Disk IO SUMMARY      Read: 10.5 MB/s, Write: 588 kB/s, Latency: 583 microseconds
DotNet Memory Management _Global_ Time in GC: 1.22%
ESX CPU              demand is 9.825 Ghz, 8 virtual CPUs
ESX Datastores       Stored on ---site---_---datastore--- (1.64 TB/60.9% free)
ESX Guest Tools      VMware Tools are installed and the version is current
ESX Heartbeat        Heartbeat status is green
ESX Hostsystem       Running on ---hostsystem---
ESX Memory           Host: 10.07 GB, Guest: 3.10 GB, Ballooned: 0.00 B, Private: 10.00 GB, Shared: 0.00 B
ESX Mounted Devices  HA functionality guaranteed
ESX Name             ---site-------server---
ESX Snapshots        Count: 0
Filesystem C:/       79.45% used (59.31 of 74.66 GB), trend: +127.12 MB / 24 hours - time left until disk full: 4 months
Filesystem D:/       28.45% used (11.38 of 40.00 GB), trend: -131.06 MB / 24 hours
Interface 1          [Ethernet0, Intel[R] 82574L Gigabit Network Connection], (Connected), MAC: 00:50:56:A2:BC:1E, Speed: 1 GBit/s, In average 15min: 121 MB/s (96.93%), Out average 15min: 7.07 MB/s (5.65%)
Interface 2          [isatap.{A195558E-A52A-499E-9472-B137A9A6ABE0}, isatap.{A195558E-A52A-499E-9472-B137A9A6ABE0}], (Connected), Speed: 100 kBit/s, In average 15min: 0.00 B/s (0%), Out average 15min: 0.00 B/s (0%)
Log Application      No error messages
Log HardwareEvents   No error messages
Log Internet Explorer No error messages
Log Key Management Service No error messages
Log Security         No error messages
Log System           No error messages
Log Veeam Backup     Forwarded 0 messages
Log Windows PowerShell No error messages
MSSQL Blocked Sessions Summary: 56 blocked by 1 ID(s), 60 blocked by 1 ID(s), 61 blocked by 1 ID(s), 65 blocked by 1 ID(s), 67 blocked by 1 ID(s), 85 blocked by 1 ID(s), 91 blocked by 1 ID(s)(!!)
MSSQL Connections ---database--- VeeamBackup Connections: 40
MSSQL Connections ---database--- master Connections: 27
MSSQL Connections ---database--- model Connections: 0
MSSQL Connections ---database--- msdb Connections: 0
MSSQL Connections ---database--- tempdb Connections: 0
MSSQL Datafile ---database---.VeeamBackup.VeeamBackup Used: 1.74 GiB, Allocated used: 1.74 GiB, Allocated: 1.98 GiB, Maximum size: unlimited
MSSQL Datafile ---database---.master.master Used: 3.00 MiB, Allocated used: 3.00 MiB, Allocated: 4.00 MiB, Maximum size: unlimited
MSSQL Datafile ---database---.model.modeldev Used: 2.00 MiB, Allocated used: 2.00 MiB, Allocated: 8.00 MiB, Maximum size: unlimited
MSSQL Datafile ---database---.msdb.MSDBData Used: 17.0 MiB, Allocated used: 17.0 MiB, Allocated: 17.0 MiB, Maximum size: unlimited
MSSQL Datafile ---database---.tempdb.tempdev Used: 8.00 MiB, Allocated used: 8.00 MiB, Allocated: 136 MiB, Maximum size: unlimited
MSSQL Job: syspolicy_purge_history Last duration: 0.00 s, MSSQL status: Unknown, Last run: N/A, Next run: N/A
MSSQL MSSQL_---database--- Locks per Batch 308.4
MSSQL MSSQL_---database--- VeeamBackup File Sizes Data files: 1.98 GiB, Log files total: 1.13 GiB, Log files used: 5.86 MiB
MSSQL MSSQL_---database--- VeeamBackup Sizes Size: 3.12 GB, Unallocated space: 248.21 MB, 7.78%, Reserved space: 1.74 GB, 55.86%, Data: 1.41 GB, 45.39%, Indexes: 242.97 MB, 7.62%, Unused: 91.20 MB, 2.86%
MSSQL MSSQL_---database--- VeeamBackup Transactions Transactions: 4.2/s, Write Transactions: 1.0/s, Tracked Transactions: 0.0/s
MSSQL MSSQL_---database--- _Total File Sizes Data files: 2.18 GiB, Log files total: 1.17 GiB, Log files used: 10.6 MiB
MSSQL MSSQL_---database--- _Total Transactions Transactions: 25.0/s, Write Transactions: 12.2/s, Tracked Transactions: 0.0/s
MSSQL MSSQL_---database--- master Backup [database] Last backup was at 2020-04-25 20:50:28 (794 d ago)
MSSQL MSSQL_---database--- master File Sizes Data files: 4.44 MiB, Log files total: 1.99 MiB, Log files used: 1.02 MiB
MSSQL MSSQL_---database--- master Sizes Size: 6.44 MB, Unallocated space: 860.16 kB, 13.04%, Reserved space: 3.59 MB, 55.8%, Data: 1.39 MB, 21.59%, Indexes: 1.57 MB, 24.38%, Unused: 648.00 kB, 9.83%
MSSQL MSSQL_---database--- master Transactions Transactions: 0.2/s, Write Transactions: 0.0/s, Tracked Transactions: 0.0/s
MSSQL MSSQL_---database--- model Backup [database] Last backup was at 2020-04-25 20:50:28 (794 d ago)
MSSQL MSSQL_---database--- model File Sizes Data files: 8.00 MiB, Log files total: 7.99 MiB, Log files used: 1019 KiB
MSSQL MSSQL_---database--- model Sizes Size: 16.00 MB, Unallocated space: 5.62 MB, 35.12%, Reserved space: 2.38 MB, 14.89%, Data: 920.00 kB, 5.62%, Indexes: 1.12 MB, 7.03%, Unused: 368.00 kB, 2.25%
MSSQL MSSQL_---database--- model Transactions Transactions: 0.1/s, Write Transactions: 0.0/s, Tracked Transactions: 0.0/s
MSSQL MSSQL_---database--- msdb Backup [database] Last backup was at 2020-04-25 20:50:28 (794 d ago)
MSSQL MSSQL_---database--- msdb File Sizes Data files: 17.9 MiB, Log files total: 19.6 MiB, Log files used: 1.37 MiB
MSSQL MSSQL_---database--- msdb Sizes Size: 37.56 MB, Unallocated space: 921.60 kB, 2.4%, Reserved space: 17.04 MB, 45.36%, Data: 12.76 MB, 33.97%, Indexes: 3.41 MB, 9.09%, Unused: 888.00 kB, 2.31%
MSSQL MSSQL_---database--- msdb Transactions Transactions: 0.1/s, Write Transactions: 0.0/s, Tracked Transactions: 0.0/s
MSSQL MSSQL_---database--- mssqlsystemresource File Sizes Data files: 40.0 MiB, Log files total: 1.24 MiB, Log files used: 502 KiB
MSSQL MSSQL_---database--- mssqlsystemresource Transactions Transactions: 0.0/s, Write Transactions: 0.0/s, Tracked Transactions: 0.0/s
MSSQL MSSQL_---database--- tempdb File Sizes Data files: 136 MiB, Log files total: 7.99 MiB, Log files used: 898 KiB
MSSQL MSSQL_---database--- tempdb Sizes Size: 144.00 MB, Unallocated space: 132.01 MB, 91.67%, Reserved space: 3.99 MB, 2.77%, Data: 1.14 MB, 0.79%, Indexes: 1.31 MB, 0.91%, Unused: 1.54 MB, 1.07%
MSSQL MSSQL_---database--- tempdb Transactions Transactions: 20.5/s, Write Transactions: 11.2/s, Tracked Transactions: 0.0/s
MSSQL MSSQL_---database---:Buffer_Manager None Page Activity Reads: 1200.8/s, Writes: 41.7/s, Lookups: 9314.8/s
MSSQL MSSQL_---database---:Buffer_Manager None buffer_cache_hit_ratio 94.50%
MSSQL MSSQL_---database---:Catalog_Metadata VeeamBackup cache_hit_ratio 56.29%
MSSQL MSSQL_---database---:Catalog_Metadata _Total cache_hit_ratio 63.18%
MSSQL MSSQL_---database---:Catalog_Metadata master cache_hit_ratio 75.92%
MSSQL MSSQL_---database---:Catalog_Metadata model cache_hit_ratio 72.80%
MSSQL MSSQL_---database---:Catalog_Metadata msdb cache_hit_ratio 65.64%
MSSQL MSSQL_---database---:Catalog_Metadata mssqlsystemresource cache_hit_ratio 81.63%
MSSQL MSSQL_---database---:Catalog_Metadata tempdb cache_hit_ratio 98.77%
MSSQL MSSQL_---database---:Cursor_Manager_by_Type TSQL_Global_Cursor cache_hit_ratio 94.76%
MSSQL MSSQL_---database---:Cursor_Manager_by_Type TSQL_Local_Cursor cache_hit_ratio 95.15%
MSSQL MSSQL_---database---:Cursor_Manager_by_Type _Total cache_hit_ratio 95.00%
MSSQL MSSQL_---database---:Locks AllocUnit Locks Requests: 0.7/s, Timeouts: 0.0/s, Deadlocks: 0.0/s, Waits: 0.0/s
MSSQL MSSQL_---database---:Locks Application Locks Requests: 0.0/s, Timeouts: 0.0/s, Deadlocks: 0.0/s, Waits: 0.0/s
MSSQL MSSQL_---database---:Locks Database Locks Requests: 689.2/s, Timeouts: 0.0/s, Deadlocks: 0.0/s, Waits: 0.0/s
MSSQL MSSQL_---database---:Locks Extent Locks Requests: 14.0/s, Timeouts: 0.0/s, Deadlocks: 0.0/s, Waits: 0.0/s
MSSQL MSSQL_---database---:Locks File Locks Requests: 0.3/s, Timeouts: 0.0/s, Deadlocks: 0.0/s, Waits: 0.0/s
MSSQL MSSQL_---database---:Locks HoBT Locks Requests: 8.5/s, Timeouts: 0.0/s, Deadlocks: 0.0/s, Waits: 0.0/s
MSSQL MSSQL_---database---:Locks Key Locks Requests: 1664.1/s, Timeouts: 0.2/s, Deadlocks: 0.0/s, Waits: 0.0/s
MSSQL MSSQL_---database---:Locks Metadata Locks Requests: 4640.5/s, Timeouts: 0.0/s, Deadlocks: 0.0/s, Waits: 0.4/s
MSSQL MSSQL_---database---:Locks OIB Locks Requests: 0.0/s, Timeouts: 0.0/s, Deadlocks: 0.0/s, Waits: 0.0/s
MSSQL MSSQL_---database---:Locks Object Locks Requests: 3536.9/s, Timeouts: 0.0/s, Deadlocks: 0.0/s, Waits: 0.4/s
MSSQL MSSQL_---database---:Locks Page Locks Requests: 47.2/s, Timeouts: 0.0/s, Deadlocks: 0.0/s, Waits: 0.0/s
MSSQL MSSQL_---database---:Locks RID Locks Requests: 24.9/s, Timeouts: 0.0/s, Deadlocks: 0.0/s, Waits: 0.0/s
MSSQL MSSQL_---database---:Locks RowGroup Locks Requests: 0.0/s, Timeouts: 0.0/s, Deadlocks: 0.0/s, Waits: 0.0/s
MSSQL MSSQL_---database---:Locks _Total Locks Requests: 10626.3/s, Timeouts: 0.2/s, Deadlocks: 0.0/s, Waits: 0.8/s
MSSQL MSSQL_---database---:Plan_Cache Bound_Trees cache_hit_ratio 61.50%
MSSQL MSSQL_---database---:Plan_Cache Extended_Stored_Procedures cache_hit_ratio 96.85%
MSSQL MSSQL_---database---:Plan_Cache Object_Plans cache_hit_ratio 65.97%
MSSQL MSSQL_---database---:Plan_Cache SQL_Plans cache_hit_ratio 17.10%
MSSQL MSSQL_---database---:Plan_Cache Temporary_Tables_&_Table_Variables cache_hit_ratio 43.42%
MSSQL MSSQL_---database---:Plan_Cache _Total cache_hit_ratio 56.98%
MSSQL MSSQL_---database---:SQL_Statistics None batch_requests/sec 34.5/s
MSSQL MSSQL_---database---:SQL_Statistics None sql_compilations/sec 13.5/s
MSSQL MSSQL_---database---:SQL_Statistics None sql_re-compilations/sec 0.4/s
MSSQL Transactionlog ---database---.VeeamBackup.VeeamBackup_log Used: 5.00 MiB, Allocated used: 5.00 MiB, Allocated: 1.13 GiB, Maximum size: 2.00 TiB
MSSQL Transactionlog ---database---.master.mastlog Used: 0 B, Allocated used: 0 B, Allocated: 2.00 MiB, Maximum size: unlimited
MSSQL Transactionlog ---database---.model.modellog Used: 0 B, Allocated used: 0 B, Allocated: 8.00 MiB, Maximum size: unlimited
MSSQL Transactionlog ---database---.msdb.MSDBLog Used: 1.00 MiB, Allocated used: 1.00 MiB, Allocated: 19.0 MiB, Maximum size: 2.00 TiB
MSSQL Transactionlog ---database---.tempdb.templog Used: 1.00 MiB, Allocated used: 1.00 MiB, Allocated: 8.00 MiB, Maximum size: unlimited
MSSQL ---database--- Instance Version: Microsoft SQL Server 2016 (SP1) (13.0.4001.0) - Express Edition (64-bit)
MSSQL ---database--- VeeamBackup Database Status: ONLINE, Recovery: SIMPLE, Auto close: off, Auto shrink: off
MSSQL ---database--- master Database Status: ONLINE, Recovery: SIMPLE, Auto close: off, Auto shrink: off
MSSQL ---database--- model Database Status: ONLINE, Recovery: SIMPLE, Auto close: off, Auto shrink: off
MSSQL ---database--- msdb Database Status: ONLINE, Recovery: SIMPLE, Auto close: off, Auto shrink: off
MSSQL ---database--- tempdb Database Status: ONLINE, Recovery: SIMPLE, Auto close: off, Auto shrink: off
Memory and pagefile  RAM: 93.91% - 9.39 GB of 10.00 GB, Commit charge: 79.51% - 12.29 GB of 15.46 GB
Processor Queue      15 min load: 0.02 at 8 logical cores (0.00 per core)
Service Summary      Autostart services: 68, Stopped services: 3
System Time          Offset: -465 ms
System Updates       0 important updates, 0 optional updates
Uptime               Up since Jun 17 2022 13:49:20, Uptime: 11 days 4 hours
Windows License      Software is Licensed
+ EXECUTING INVENTORY PLUGINS
 hp_proliant_mem: skipped (no data)
 hp_proliant_mem: skipped (no data)
 ipmi_firmware: skipped (no data)
 ipmi_firmware: skipped (no data)
 inventory_checkmk: skipped (no data)
 inventory_checkmk: skipped (no data)
 inv_cisco_vlans: skipped (no data)
 inv_cisco_vlans: skipped (no data)
 inv_esx_vsphere_hostsystem: skipped (no data)
 inv_esx_vsphere_hostsystem: skipped (no data)
 inv_if: ok
 inv_if: skipped (no data)
 inventory_oracle_tablespaces: skipped (no data)
 inventory_oracle_tablespaces: skipped (no data)
 dmidecode: skipped (no data)
 dmidecode: skipped (no data)
 docker_node_network: skipped (no data)
 docker_node_network: skipped (no data)
 inventory_esx_vsphere_clusters: skipped (no data)
 inventory_esx_vsphere_clusters: skipped (no data)
 inventory_esx_vsphere_virtual_machines: skipped (no data)
 inventory_esx_vsphere_virtual_machines: skipped (no data)
 k8s_endpoint_info: skipped (no data)
 k8s_endpoint_info: skipped (no data)
 k8s_ingress_infos: skipped (no data)
 k8s_ingress_infos: skipped (no data)
 juniper_info: skipped (no data)
 juniper_info: skipped (no data)
 mem: ok
 mem: skipped (no data)
 mem_used: skipped (no data)
 mem_used: skipped (no data)
 snmp_info: skipped (no data)
 snmp_info: skipped (no data)
 aix_baselevel: skipped (no data)
 aix_baselevel: skipped (no data)
 aix_lparstat_inventory: skipped (no data)
 aix_lparstat_inventory: skipped (no data)
 aix_packages: skipped (no data)
 aix_packages: skipped (no data)
 aix_service_packs: skipped (no data)
 aix_service_packs: skipped (no data)
 allnet_ip_sensoric: skipped (no data)
 allnet_ip_sensoric: skipped (no data)
 aruba_wlc_aps: skipped (no data)
 aruba_wlc_aps: skipped (no data)
 check_mk: ok
 check_mk: skipped (no data)
 checkpoint_inv_tunnels: skipped (no data)
 checkpoint_inv_tunnels: skipped (no data)
 citrix_controller: skipped (no data)
 citrix_controller: skipped (no data)
 citrix_state: skipped (no data)
 citrix_state: skipped (no data)
 couchbase_nodes_ports: skipped (no data)
 couchbase_nodes_ports: skipped (no data)
 dell_hw_info: skipped (no data)
 dell_hw_info: skipped (no data)
 docker_container_labels: skipped (no data)
 docker_container_labels: skipped (no data)
 docker_container_network: skipped (no data)
 docker_container_network: skipped (no data)
 docker_container_node_name: skipped (no data)
 docker_container_node_name: skipped (no data)
 docker_node_images: skipped (no data)
 docker_node_images: skipped (no data)
 docker_node_info: skipped (no data)
 docker_node_info: skipped (no data)
 esx_systeminfo: skipped (no data)
 esx_systeminfo: skipped (no data)
 fireeye_sys_status: skipped (no data)
 fireeye_sys_status: skipped (no data)
 fritz: skipped (no data)
 fritz: skipped (no data)
 hp_proliant_da_phydrv: skipped (no data)
 hp_proliant_da_phydrv: skipped (no data)
 hp_proliant_systeminfo: skipped (no data)
 hp_proliant_systeminfo: skipped (no data)
 ibm_mq_channels: skipped (no data)
 ibm_mq_channels: skipped (no data)
 ibm_mq_managers: skipped (no data)
 ibm_mq_managers: skipped (no data)
 ibm_mq_queues: skipped (no data)
 ibm_mq_queues: skipped (no data)
 infoblox_osinfo: skipped (no data)
 infoblox_osinfo: skipped (no data)
 infoblox_systeminfo: skipped (no data)
 infoblox_systeminfo: skipped (no data)
 k8s_assigned_pods: skipped (no data)
 k8s_assigned_pods: skipped (no data)
 k8s_daemon_pod_containers: skipped (no data)
 k8s_daemon_pod_containers: skipped (no data)
 k8s_job_container: skipped (no data)
 k8s_job_container: skipped (no data)
 k8s_nodes: skipped (no data)
 k8s_nodes: skipped (no data)
 k8s_pod_container: skipped (no data)
 k8s_pod_container: skipped (no data)
 k8s_pod_info: skipped (no data)
 k8s_pod_info: skipped (no data)
 k8s_roles: skipped (no data)
 k8s_roles: skipped (no data)
 k8s_selector: skipped (no data)
 k8s_selector: skipped (no data)
 k8s_service_info: skipped (no data)
 k8s_service_info: skipped (no data)
 lnx_block_devices: skipped (no data)
 lnx_block_devices: skipped (no data)
 lnx_cpuinfo: skipped (no data)
 lnx_cpuinfo: skipped (no data)
 lnx_distro: skipped (no data)
 lnx_distro: skipped (no data)
 lnx_if: skipped (no data)
 lnx_if: skipped (no data)
 lnx_ip_r: skipped (no data)
 lnx_ip_r: skipped (no data)
 lnx_packages: skipped (no data)
 lnx_packages: skipped (no data)
 lnx_sysctl: skipped (no data)
 lnx_sysctl: skipped (no data)
 lnx_uname: skipped (no data)
 lnx_uname: skipped (no data)
 lnx_video: skipped (no data)
 lnx_video: skipped (no data)
 lparstat_aix: skipped (no data)
 lparstat_aix: skipped (no data)
 mssql_clusters: skipped (no data)
 mssql_clusters: skipped (no data)
 mssql_versions: ok
 mssql_versions: skipped (no data)
 netapp_api_disk: skipped (no data)
 netapp_api_disk: skipped (no data)
 netapp_api_info: skipped (no data)
 netapp_api_info: skipped (no data)
 oracle_dataguard_stats: skipped (no data)
 oracle_dataguard_stats: skipped (no data)
 oracle_instance: skipped (no data)
 oracle_instance: skipped (no data)
 oracle_performance: skipped (no data)
 oracle_performance: skipped (no data)
 oracle_recovery_area: skipped (no data)
 oracle_recovery_area: skipped (no data)
 oracle_systemparameter: skipped (no data)
 oracle_systemparameter: skipped (no data)
 perle_chassis: skipped (no data)
 perle_chassis: skipped (no data)
 perle_chassis_slots: skipped (no data)
 perle_chassis_slots: skipped (no data)
 perle_psmu: skipped (no data)
 perle_psmu: skipped (no data)
 prtconf: skipped (no data)
 prtconf: skipped (no data)
 snmp_extended_info: skipped (no data)
 snmp_extended_info: skipped (no data)
 snmp_os: skipped (no data)
 snmp_os: skipped (no data)
 snmp_quantum_storage_info: skipped (no data)
 snmp_quantum_storage_info: skipped (no data)
 solaris_addresses: skipped (no data)
 solaris_addresses: skipped (no data)
 solaris_pkginfo: skipped (no data)
 solaris_pkginfo: skipped (no data)
 solaris_prtdiag: skipped (no data)
 solaris_prtdiag: skipped (no data)
 solaris_prtpicl: skipped (no data)
 solaris_prtpicl: skipped (no data)
 solaris_psrinfo: skipped (no data)
 solaris_psrinfo: skipped (no data)
 solaris_routes: skipped (no data)
 solaris_routes: skipped (no data)
 solaris_uname: skipped (no data)
 solaris_uname: skipped (no data)
 statgrab_net: skipped (no data)
 statgrab_net: skipped (no data)
 suseconnect: skipped (no data)
 suseconnect: skipped (no data)
 win_bios: ok
 win_bios: skipped (no data)
 win_computersystem: ok
 win_computersystem: skipped (no data)
 win_cpuinfo: ok
 win_cpuinfo: skipped (no data)
 win_disks: ok
 win_disks: skipped (no data)
 win_exefiles: ok
 win_exefiles: skipped (no data)
 win_ip_r: ok
 win_ip_r: skipped (no data)
 win_networkadapter: ok
 win_networkadapter: skipped (no data)
 win_os: ok
 win_os: skipped (no data)
 win_reg_uninstall: ok
 win_reg_uninstall: skipped (no data)
 win_system: ok
 win_system: skipped (no data)
 win_video: ok
 win_video: skipped (no data)
 win_wmi_software: ok
 win_wmi_software: skipped (no data)
 win_wmi_updates: ok
 win_wmi_updates: skipped (no data)
 winperf_if: ok
 winperf_if: skipped (no data)

Piggyback file '/omd/sites/---site---/tmp/check_mk/piggyback/---host---/---piggy---': ---piggy---
Piggyback file '/omd/sites/---site---/tmp/check_mk/piggyback/---host---/---piggy---': ---piggy---
[cpu_tracking] Stop [7fc15566ddf0 - Snapshot(process=posix.times_result(user=0.17999999999999994, system=0.02999999999999997, children_user=0.0, children_system=0.0, elapsed=0.26999999955296516))]
[agent] Version: 2.0.0p23, OS: windows, Allowed IP ranges: ---checkmk---, [piggyback] Valid sources: ---piggy---, ---piggy---, Missing monitoring data for check plugins: veeam_jobs(!), execution time 119.2 sec | execution_time=119.200 user_time=0.190 system_time=0.030 children_user_time=0.000 children_system_time=0.000 cmk_time_agent=118.920

Hey, everyone.

Sometime in the last couple of weeks, a whole bunch of our Windows Servers just stop running the CheckMK Agent in a timely manner for a while. They’ll be going along, perfectly fine, then suddenly they just… stop. They’ll stay like that for a few hours, then suddenly begin working normally again.

If I run “check_mk_agent test” on the host, I get the same results: A couple minutes of waiting, a dump of results. Some of these were out of date agents (1.6.0p8) which were running perfectly until the last couple of weeks, so I pulled the agents forward (2.0.0p23) but have the same issue intermitently.

Does anyone have any suggestions on troubleshooting to find the issue?

Thanks,
Tralin

I think most important for this problem is the used agent config.
As it is a CEE with active agent updater the bakery YAML file would be relevant.
The user YAML should be the default file i hope.

Hey, Andreas.

I used this server as an example because it was what was problematic at the time (since posting, this particular agent has corrected itself).

The older agents with the same problem (1.6.0p8) are not from the Bakery, as they were before we licensed CMK.

Thanks,
Tralin

Then the used agent config also if not from bakery is important.

Hello Tralin,

Debugging here with cmk is not so helpful because it doesnt show the timing of each section.
At least I can see “veeam_jobs(!), execution time 119.2 sec” in the output. If you run with default settings the checkmk check will run in to a timeout I estimate.
You may either change timeout settings and scheduling of the checkmk check or better run the veeam plugin asynchron to mitigate. But as first step I would check if the root cause could be fixed. For VEEAM I know that loading the Powershell modues needs some time. Powershell could be configured to preload such modules to a avoid long load times. I am not an expert with Windows stuff and cannot help so much but may ask Dr. GOOGLE

To have a better view of the timing of your agent have a look at C:\ProgramData\checkmk\agent\log\check_mk.
In the past we had several issues with WMI. In most cases a reboot of the Windows server fixed that.

I hope that helps

Michael

That was the complete execution time of the agent.
@Tralin inside the agent log folder you see the exact information where it took so long.
I expect that this is a agent configuration problem that’s why i said the agent config would be very helpful here.

Yep, but the default timeout is 60sec if I am not mistaken.
About VEEAM was just a shoot in the dark :wink:

Thanks, for the responses, Andreas and Mike.

I’m not sure the answer is in the agent configuration. The agents were functioning entirely normally for over two years with no configuration changes, updates, or otherwise. The only reason I’ve switched some of these over to the Bakery was to try to fix agents just not working. Nevertheless, here’s the agent Bakery config from the server I posted yesterday:

# Created by Check_MK Agent Bakery.
# This file is managed via WATO, do not edit manually or you
# lose your changes next time when you update the agent.

global:
  disabled_sections:
  - msexch
  - openhardwaremonitor
  - skype
  enabled: true
  install: true
  only_from:
  - ---checkmk---
  port: 6556
plugins:
  enabled: true
  execution:
  - async: true
    cache_age: 14400
    pattern: $CUSTOM_PLUGINS_PATH$\windows_updates.vbs
    timeout: 600
  - cache_age: 3600
    pattern: $CUSTOM_PLUGINS_PATH$\cmk_update_agent.checkmk.py
ps:
  enabled: true
  full_path: true
  use_wmi: true
system:
  enabled: true
  firewall:
    mode: configure
    port: auto
  service:
    error_mode: log
    restart_on_crash: 'yes'
    start_mode: auto

Going back through the log, I do see Veeam was taking a very long time to return data, but so was the MSSQL check.

2022-06-28 14:40:16.463 [srv 5408] [Err  ] Timeout [5] seconds broken  when query WMI
2022-06-28 14:40:16.464 [srv 5408] [Warn ] Object 'Win32_PerfRawData_NETFramework_NETCLRMemory' in 4994ms sends NO DATA
2022-06-28 14:40:16.465 [srv 5408] [Warn ] On timeout in section 'dotnet_clrmemory' try reuse cache
2022-06-28 14:40:16.467 [srv 5408] [Trace] Sending data 'dotnet_clrmemory' id is [967921619262512] length [8519]
2022-06-28 14:40:16.468 [srv 5408] perf: Section 'dotnet_clrmemory' took [4997] milliseconds
2022-06-28 14:40:16.479 [srv 5408] Received [8647] bytes from 'dotnet_clrmemory'
2022-06-28 14:41:27.030 [srv 5408] [Warn ] perf:  In [75605] milliseconds process 'powershell.exe -NoLogo -NoProfile -ExecutionPolicy Bypass -File "C:\ProgramData\checkmk\agent\plugins\veeam_backup_status.ps1"' pid:[13332] FAILED - generated [378] bytes of data in [2] blocks
2022-06-28 14:41:27.045 [srv 5408] [Warn ] Sync Plugin stopped 'C:\ProgramData\checkmk\agent\plugins\veeam_backup_status.ps1' Stopped: false Failed: true
2022-06-28 14:41:27.752 [srv 5408] [Warn ] perf:  In [76324] milliseconds process 'cscript.exe //Nologo "C:\ProgramData\checkmk\agent\plugins\mssql.vbs"' pid:[10572] FAILED - generated [191880] bytes of data in [71] blocks
2022-06-28 14:41:27.776 [srv 5408] [Warn ] Sync Plugin stopped 'C:\ProgramData\checkmk\agent\plugins\mssql.vbs' Stopped: false Failed: true
2022-06-28 14:41:27.794 [srv 5408] [Trace] Provider 'plugins' is about to be started, id '967921619262512' port [mail:\\.\mailslot\Global\WinAgent_0]
2022-06-28 14:41:27.796 [srv 5408] [Trace] Sending data 'plugins' id is [967921619262512] length [2820]

While I can certainly see Veeam and MSSQL tying each other up, it’s still strange that suddenly after years of everything working fine it suddenly breaks. It also doesn’t explain the other servers that have had the same issue.

I’ll try to keep an eye on the monitor today and see if I can catch a non-Veeam server having the issue, or if a Veeam server does have the issue I’ll try manually running the Veeam check to see if it works, and try removing the Veeam check from the agent to test.

Thanks,
Tralin

You hav a WMI timeout:

There is a possible solution:
https://kb.checkmk.com/display/KB/Increase+WMI+Timeout

But first try to reboot or even repair WMI DB.

For VEEAM I already wrote a possible solution.

For MSSQL I remember we had a timing issue on a quite large shared DB with lacking an index on msdb.backupset. If I remember well this problem was fixed in newer versions of MSSQL.

Beside the findings from @mike1098, the biggest problem i see is the no so good configuration.

  • all plugin and local check execution should be set to async
  • all plugins with a longer runtime than some seconds, should be configured with a reasonable timeout and cache setting (veeam)

In the end it is also possible that your WMI problem is also the reason for a longer than usual plugin runtime.