CheckMK 2.1.0p24: unable to scan services from Windows Server 2019 Standard

CMK version: 2.1.0p24
OS version: Debian 4.19.269-1 (2022-12-20) x86_64 GNU/Linux

We recently did update checkmk system from 1.6 to 2.1.0p24. After updating, we tried installing the agent on Windows Server 2019 Standard. However checkmk system is not able to scan for any services even firewall rules added as well as TCP/6556 opened and listening well.

Error message:
API Error:Error running automation call diag-host: Your request timed out after 110 seconds.
This issue may be related to a local configuration problem or a request which works with a too large number of objects. But if you think this issue is a bug, please send a crash report.

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)
Precompile /omd/sites/monitoring/local/share/check_mk/checks/jolokia_metrics to /omd/sites/monitoring/var/check_mk/precompiled_checks/local/jolokia_metrics Error in check include file /omd/sites/monitoring/local/share/check_mk/checks/jolokia_metrics: [Errno 2] No such file or directory: '/omd/sites/monitoring/local/share/check_mk/checks/jolokia_metrics' Error in plugin file /omd/sites/monitoring/local/share/check_mk/checks/jolokia_generic: [Errno 2] No such file or directory: '/omd/sites/monitoring/local/share/check_mk/checks/jolokia_metrics' Trying to acquire lock on /omd/sites/monitoring/var/check_mk/crashes/base/8ceb525a-c7b0-11ed-a9a4-005056a948fb/crash.info Got lock on /omd/sites/monitoring/var/check_mk/crashes/base/8ceb525a-c7b0-11ed-a9a4-005056a948fb/crash.info Releasing lock on /omd/sites/monitoring/var/check_mk/crashes/base/8ceb525a-c7b0-11ed-a9a4-005056a948fb/crash.info Released lock on /omd/sites/monitoring/var/check_mk/crashes/base/8ceb525a-c7b0-11ed-a9a4-005056a948fb/crash.info Traceback (most recent call last): File "/omd/sites/monitoring/bin/cmk", line 79, in <module> errors = config.load_all_agent_based_plugins(check_api.get_check_api_context) File "/omd/sites/monitoring/lib/python3/cmk/base/config.py", line 1528, in load_all_agent_based_plugins errors.extend(load_checks(get_check_api_context, filelist)) File "/omd/sites/monitoring/lib/python3/cmk/base/config.py", line 1608, in load_checks did_compile |= load_check_includes(f, check_context) File "/omd/sites/monitoring/lib/python3/cmk/base/config.py", line 1725, in load_check_includes did_compile |= load_precompiled_plugin(include_file_path, check_context) File "/omd/sites/monitoring/lib/python3/cmk/base/config.py", line 1899, in load_precompiled_plugin py_compile.compile(path, precompiled_path, doraise=True) File "/omd/sites/monitoring/lib/python3.9/py_compile.py", line 142, in compile source_bytes = loader.get_data(file) File "<frozen importlib._bootstrap_external>", line 1039, in get_data FileNotFoundError: [Errno 2] No such file or directory: '/omd/sites/monitoring/local/share/check_mk/checks/jolokia_metrics'

We even changed the host check interval to 4 minutes 30 seconds with no help. Logs from cmk agent shown everything is normal.

Thank you!

This file is a own or modified version of the included Jolokia check.
At upgrade time it should also be shown as a problem.
Please remove it and test the discovery again.

Hi @andreas-doehler
Thanks for quickly answering.
Error message reported no such file or directory. I have checked to see no jolokia_metrics to be removed, only jolokia_generic there.
One strange thing is that, cmk --debug shown the same error for all hosts but only that host got stuck when trying cmk -IIv.

I just fixed the issue of missing jolokia_metrics. CMK debug is fine now, system and agent got connected but stuck at reading data. Looks like there is no data can be transferred even connection worked.
Scan for services failed as below.

2023-03-21 09:06:04,952 [40] [cmk.web.background-job 32579] Exception in background function
Traceback (most recent call last):
  File "/omd/sites/monitoring/lib/python3/cmk/gui/watolib/automations.py", line 103, in check_mk_local_automation_serialized
    completed_process = subprocess.run(
  File "/omd/sites/monitoring/lib/python3.9/subprocess.py", line 507, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/omd/sites/monitoring/lib/python3.9/subprocess.py", line 1134, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/omd/sites/monitoring/lib/python3.9/subprocess.py", line 1979, in _communicate
    ready = selector.select(timeout)
  File "/omd/sites/monitoring/lib/python3.9/selectors.py", line 416, in select
    fd_event_list = self._selector.poll(timeout)
  File "/omd/sites/monitoring/lib/python3/cmk/gui/background_job.py", line 173, in _handle_sigterm
    raise MKTerminate()
cmk.utils.exceptions.MKTerminate

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/omd/sites/monitoring/lib/python3/cmk/gui/background_job.py", line 246, in _execute_function
    func_ptr(*args, **kwargs)
  File "/omd/sites/monitoring/lib/python3/cmk/gui/watolib/services.py", line 853, in discover
    self._perform_service_scan(api_request)
  File "/omd/sites/monitoring/lib/python3/cmk/gui/watolib/services.py", line 867, in _perform_service_scan
    try_discovery(
  File "/omd/sites/monitoring/lib/python3/cmk/gui/watolib/check_mk_automations.py", line 161, in try_discovery
    _automation_serialized(
  File "/omd/sites/monitoring/lib/python3/cmk/gui/watolib/check_mk_automations.py", line 67, in _automation_serialized
    cmdline, serialized_result = check_mk_local_automation_serialized(
  File "/omd/sites/monitoring/lib/python3/cmk/gui/watolib/automations.py", line 113, in check_mk_local_automation_serialized
    raise local_automation_failure(command=command, cmdline=cmd, exc=e)
cmk.utils.exceptions.MKGeneralException: Error running automation call try-inventory: 
Exception: Error running automation call try-inventory: ```

The error message looks like a timeout.
Do you see anything inside the agent log file on the Windows server?

Hi @andreas-doehler ,
I have checked and compared with another Windows server to see everything looks normal.
Perhaps the issue came from network side, checkmk system and server can be connected via tcp/6556 but looks like no data can be transmitted.

I just figured out, that the agent test itself in the host worked fine, but there is so many data there. From our checkmk system, it got stuck when pulling such data remotely.
I guess it is about network bandwidth, or anything we can work around?

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.