CMK version: 2.0.0p24 (CEE)
OS version: Debian 9.13 (CheckMK Server) Microsoft Windows Server 2012 R2 Standard (Monitored Host)
Error message:
2022-08-03 04:32:32.404 [srv 11748] Connected from ‘10.161.101.99’ ipv6 :false → queue
2022-08-03 04:32:32.405 [srv 11748] [Err ] queue is overflown
Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)
n/a
Hello, we have this Problem which occures when the RAM Usage of the Windows Server gets very high, near 100%. The Agent goes into a stale state where it throught the above error Message until it is restarted. Sadly this issue doesn’t resolve when enough memory is free again by itself.
the Log of the Agent looks like this, going from “Service is working” to “Stale” and “manual restart of the service by an admin”
2022-08-03 03:52:23.174 [srv 11748] perf: Section 'local' took [1] milliseconds
2022-08-03 03:52:23.200 [srv 11748] Received [128] bytes from 'local'
2022-08-03 03:52:35.483 [srv 11748] perf: In [16076] milliseconds process 'powershell.exe -NoLogo -NoProfile -ExecutionPolicy Bypass -File "C:\ProgramData\checkmk\agent\plugins\windows_if.ps1"' pid:[32380] SUCCEDED - generated [0] bytes of data in [0] blocks
2022-08-03 03:52:35.485 [srv 11748] [Warn ] Process 'C:\ProgramData\checkmk\agent\plugins\windows_if.ps1' has no data
2022-08-03 03:55:29.190 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 03:58:20.057 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:01:29.392 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:04:20.244 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:05:30.351 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:07:30.398 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:10:21.240 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:11:30.595 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:13:30.899 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:16:21.434 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:17:31.795 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:19:31.779 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:22:21.633 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:23:31.965 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:25:31.938 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:28:22.584 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:29:33.375 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:29:33.377 [srv 11748] [Err ] queue is overflown
2022-08-03 04:31:32.062 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:31:32.063 [srv 11748] [Err ] queue is overflown
2022-08-03 04:32:32.404 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:32:32.405 [srv 11748] [Err ] queue is overflown
2022-08-03 04:34:22.767 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:34:22.768 [srv 11748] [Err ] queue is overflown
2022-08-03 04:35:23.070 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 04:35:23.072 [srv 11748] [Err ] queue is overflown
...
2022-08-03 07:47:24.948 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 07:47:24.950 [srv 11748] [Err ] queue is overflown
2022-08-03 07:48:25.165 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 07:48:25.166 [srv 11748] [Err ] queue is overflown
2022-08-03 07:49:25.372 [srv 11748] Connected from '10.161.101.99' ipv6 :false -> queue
2022-08-03 07:49:25.374 [srv 11748] [Err ] queue is overflown
2022-08-03 07:50:00.483 [srv 11748] Initiating stop routine...
2022-08-03 07:50:00.484 [srv 11748] Stop Service called
2022-08-03 07:50:00.486 [srv 11748] [Trace] Stop request is set
2022-08-03 07:50:00.488 [srv 11748] [Trace] main Wait Loop END
2022-08-03 07:50:00.489 [srv 11748] Shutting down IO...
2022-08-03 07:50:00.490 [srv 11748] [Trace] Stopping execution
2022-08-03 07:51:19.915 [srv 18500] [Trace] Enabled Base
2022-08-03 07:51:19.923 [srv 18500] [Trace] Setting root. service: 'CheckMkService', preset: ''
2022-08-03 07:51:19.924 [srv 18500] [Trace] Try service: 'CheckMkService'
2022-08-03 07:51:19.925 [srv 18500] [Trace] Try registry 'CheckMkService'
2022-08-03 07:51:19.927 [srv 18500] [Trace] Service is found 'C:\Program Files (x86)\checkmk\service\check_mk_agent.exe'
the only thing i could find about the error “[Err ] queue is overflown” is in the code of the agent in this file in line 244
I’m not a programmer myself but i can wildly guess considering the error message and code snippet that throws the error:
- a problem with the connection queue from the cmk server, maybe an open session doesn’t get released so it queues the new connection?
- the thread of the agent cannot be woken up from a sleep / idle state?
Thanks for any insight you can provide