CMK version: 2.5.0p6 Community
OS version: Ubuntu 24.04.4
Error message in nagios.log
Warning: Check result queue contained results for service ‘Unimplemented check services_summary’ on host ‘HOSTNAME’, but the service could not be found! Perhaps you forgot to define the service in your config files?
The same warning appears for several other check types on the same hosts:
dotnet_clrmemory, esx_vsphere_vm_cpu, esx_vsphere_vm_heartbeat, esx_vsphere_vm_mem_usage, esx_vsphere_vm_mounted_devices, esx_vsphere_vm_name, esx_vsphere_objects_count.
Description:
I have Windows/Linux hosts where certain services are intentionally excluded from active monitoring via the “Disabled services” ruleset (not single-service manual disabling, but proper rule-based exclusion by host label, e.g. cmk/os_family:windows).
Example rule:
Host matching labels: [cmk/os_family:windows]
Service name is ^DotNet Memory Management Global, ^ESX CPU, ^ESX Heartbeat, ^ESX Memory, ^ESX Mounted Devices, ^ESX Name, ^Object count or ^Service Summary
Value: Positive match (add services / hosts to the set)
These services show up correctly under “Disabled services” in the Service discovery page, with normal OK check output. Autochecks are present and correct - I checked the autochecks file directly:
{‘check_plugin_name’: ‘esx_vsphere_vm_cpu’, ‘item’: None, ‘parameters’: {}, ‘service_labels’: {}},
{‘check_plugin_name’: ‘dotnet_clrmemory’, ‘item’: ‘Global’, ‘parameters’: {‘upper’: (10.0, 15.0)}, ‘service_labels’: {}},
So the plugins are not orphaned/unimplemented in the literal sense - they exist, are discovered correctly, and produce valid output when run manually (cmk -v HOSTNAME works fine for these checks).
However, since these services are disabled, no corresponding Nagios service object is generated in the active config. The check itself still runs on every check cycle (it’s not excluded from execution, only from the generated config), and when its result reaches the core via the check result queue, Nagios can’t find a matching service object - hence the “Unimplemented check X … but the service could not be found” warning, repeated on every single check cycle for every affected host/check combination.
With lot of hosts x ~7-8 disabled check types each, this generates roughly 1-2.5 million warning lines per day in var/log/nagios.log / var/nagios/archive, filling the filesystem (/opt/omd) within roughly 10 days even with weekly log rotation/cleanup. We had to manually purge the archive and tighten the “Automatic disk space cleanup” global setting as a workaround.
Steps to reproduce:
- Create a “Disabled services” rule excluding a check type (e.g.
esx_vsphere_vm_cpu) for a group of hosts via host label condition - Run service discovery + activate changes
- Confirm the service appears under “Disabled services” with OK status in the discovery view
- Watch
var/log/nagios.log- the warning appears on every check cycle for that host/check
What I already checked:
- Verified autochecks are not orphaned (plugin exists and runs correctly via
cmk -v) - Re-ran
cmk -II --all(force rediscovery) - no change, warning persists - Checked Werks #19805/#19806 (2.5.0p3, “Disabled services are no longer written to autochecks via the discovery backend” / “Remove all and find new now removes vanished services”) - these appear to address discovery-side autochecks writing, not this specific runtime symptom; we are already on p6 which includes these fixes, and the warning is unaffected
- Confirmed this only happens for checks under “Disabled services” rules - other disabled-via-SNMP-ruleset services (e.g. Disabled Interfaces monitoring on network devices) do not produce this warning
Question:
Is this expected behavior of “Disabled services” combined with the Nagios core (since Nagios core requires a static config and can’t tolerate result-for-unknown-service, unlike CMC)? Or is this a regression/bug specific to 2.5?
If expected: is “Disabled checks” ruleset (which prevents discovery/execution entirely, rather than hiding the result) the documented/recommended approach instead of “Disabled services” when running on Nagios core, to avoid this log flooding? I cannot find this distinction clearly documented anywhere.
Happy to provide more debug output if needed.
Thank you.