Unable to resolve services/checks for hosts;

CMK version:
This is Check_MK version 2.1.0p24 CRE

OMD version:
OMD - Open Monitoring Distribution Version 2.1.0p24.cre

OS version:
PRETTY_NAME=“Ubuntu 22.04.2 LTS”
NAME=“Ubuntu”
VERSION_ID=“22.04”
VERSION=“22.04.2 LTS (Jammy Jellyfish)”

Environment information
Running as a StatefulSet in EKS/K8S

Problem statement:
In current setup of checkmk site, I am trying to pull hosts from backend and dump it into hosts file at /omd/sites/aegismk/hosts which shows hosts in UI. We have a backend service which generates checkmk agent data which is available from a curl command and that is what we are trying to configure in main.mk by providing datasource_programs variable. Though, checkmk is not executing that curl command and defaulting to icmp-ping which is not desired behavior.

Question:

  1. Is this configuration at Main.mk still valid (this is the same exact file works for 1.2.6p16)? and is it the only configuration needed to get services/checks/sensor information for each host from backend?
  2. What else is needed to make these services available fro each host?
  3. Any suggestion on the most recent approach to fetch service information programmatically in checkmk node locally?
  4. How to replace curl base response to map services for each host vs PING service that fails with error - check_icmp: Failed to obtain ICMP socket: Operation not permitted

Look forward to hearing from you/exprerts. Thank you in advance!

Additional Information:
As you can see I have provided configuration in main.mk to use datasource_program which makes call to our backend service to fetch services for each host in all_hosts list.

OMD[aegismk]:~$ cat etc/check_mk/main.mk
_user = os.environ["USER"]
all_hosts += [_host.rstrip('\n') for _host in open('/omd/sites/{}/hosts'.format(_user), 'r')]

# Host currently report as always up
host_check_commands += [
	( 'ok', all_hosts )
]

extra_host_conf['alias']=[]
_aliases = [_alias.rstrip('\n') for _alias in open('/omd/sites/{}/host_aliases'.format(_user), 'r')]

for _host in _aliases:
    _pipe=_host.index('|')
    _alias=_host[:_pipe]
    _instance=_host[_pipe+1:]
    extra_host_conf['alias'].append((_alias,[_instance]))

ipaddresses = dict([(_ip.split('|')[0], '127.0.0.1') for _ip in all_hosts])

#Add checks for hosts
datasource_programs += [( 'curl BASE_URL/checkmk?inst=<HOST>', all_hosts )]

check_submission = 'pipe'

Sample host configuration for host=pod807

OMD[aegismk]:~$ cmk -D pod807

pod807
Addresses:              127.0.0.1
Tags:                   [address_family:ip-v4-only], [agent:cmk-agent], [criticality:prod], [networking:lan], [piggyback:auto-piggyback], [pod:pod], [site:aegismk], [snmp_ds:no-snmp]
Labels:                 [cmk/site:aegismk]
Host groups:            check_mk
Contact groups:         all, check-mk-notify
Agent mode:             No agent
Type of agent:
  Process piggyback data from /omd/sites/aegismk/tmp/check_mk/piggyback/pod807
  PING only
Services:
  checktype item params description groups
  --------- ---- ------ ----------- ------

Expected/Ideal output for host=pod807

OMD[aegismk]:~$ cmk -D pod807

pod807 (no DNS, no entry in ipaddresses)
Tags:
Host groups:            prod, ecom
Contact groups:         all, check-mk-notify
Type of agent:          TCP (port: 6556)
Is aggregated:          no
Services:
  checktype item                                                                    params description                                                             groups summarized to groups
  --------- ----------------------------------------------------------------------- ------ ----------------------------------------------------------------------- ------ ------------- ------
  local     app.cpu.percent                                                    None   app.cpu.percent
  local     db.cpu.percent                                                     None   db.cpu.percent
  local     util.cpu.percent-user                                              None   util.cpu.percent-user
   ...
   ...

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)

OMD[aegismk]:~$ cmk --debug -vvn pod807
Checkmk version 2.1.0p24
Try license usage history update.
Trying to acquire lock on /omd/sites/aegismk/var/check_mk/license_usage/next_run
Got lock on /omd/sites/aegismk/var/check_mk/license_usage/next_run
Trying to acquire lock on /omd/sites/aegismk/var/check_mk/license_usage/history.json
Got lock on /omd/sites/aegismk/var/check_mk/license_usage/history.json
Next run time has not been reached yet. Abort.
Releasing lock on /omd/sites/aegismk/var/check_mk/license_usage/history.json
Released lock on /omd/sites/aegismk/var/check_mk/license_usage/history.json
Releasing lock on /omd/sites/aegismk/var/check_mk/license_usage/next_run
Released lock on /omd/sites/aegismk/var/check_mk/license_usage/next_run
+ FETCHING DATA
  Source: SourceType.HOST/FetcherType.PIGGYBACK
[cpu_tracking] Start [7fe0faff37f0]
[PiggybackFetcher] Fetch with cache settings: NoCache(pod807, base_path=/omd/sites/aegismk/tmp/check_mk/data_source_cache/piggyback, max_age=MaxAge(checking=0, discovery=120, inventory=120), disabled=True, use_outdated=False, simulation=False)
Not using cache (Cache usage disabled)
[PiggybackFetcher] Execute data source
No piggyback files for 'pod807'. Skip processing.
No piggyback files for '127.0.0.1'. Skip processing.
Not using cache (Cache usage disabled)
[cpu_tracking] Stop [7fe0faff37f0 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
+ PARSE FETCHER RESULTS
  Source: SourceType.HOST/FetcherType.PIGGYBACK
No persisted sections
  -> Add sections: []
Received no piggyback data
[cpu_tracking] Start [7fe0fb008fd0]
value store: synchronizing
Trying to acquire lock on /omd/sites/aegismk/tmp/check_mk/counters/pod807
Got lock on /omd/sites/aegismk/tmp/check_mk/counters/pod807
value store: loading from disk
Releasing lock on /omd/sites/aegismk/tmp/check_mk/counters/pod807
Released lock on /omd/sites/aegismk/tmp/check_mk/counters/pod807
No piggyback files for 'pod807'. Skip processing.
No piggyback files for '127.0.0.1'. Skip processing.
[cpu_tracking] Stop [7fe0fb008fd0 - Snapshot(process=posix.times_result(user=0.010000000000000009, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
execution time 0.0 sec | execution_time=0.000 user_time=0.010 system_time=0.000 children_user_time=0.000 children_system_time=0.000 cmk_time_agent=0.000

Backend Response / Curl Command output


OMD[aegismk]:~$ curl BASE_URL/checkmk?inst=pod807
<<<check_mk>>>
Version: 2.1.0p24
AgentOS: linux
AgentDirectory: /etc/check_mk
DataDirectory: /var/lib/check_mk_agent
SpoolDirectory: /var/lib/check_mk_agent/spool
PluginsDirectory: /usr/lib/check_mk_agent/plugins
LocalDirectory: /usr/lib/check_mk_agent/local
<<<local>>>
P nginx.openfiles.master.prd value=922.0;80000.0;100000.0 Check nginx.openfiles.master.prd (Open files for master process  https://grafana_/dashboard/db/nginx-global-stats?var-Pod=*&var-Realm=pod807&var-Instance=pod807 )  groupByNodes(pod807.infrastructure.pesslonly.*.openFiles.master, 'maxSeries', 0) >= 100000, GM: https://gm_url
...
...
...
...

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.