CMK version:
This is Check_MK version 2.1.0p24 CRE
OMD version:
OMD - Open Monitoring Distribution Version 2.1.0p24.cre
OS version:
PRETTY_NAME=“Ubuntu 22.04.2 LTS”
NAME=“Ubuntu”
VERSION_ID=“22.04”
VERSION=“22.04.2 LTS (Jammy Jellyfish)”
Environment information
Running as a StatefulSet in EKS/K8S
Problem statement:
In current setup of checkmk site, I am trying to pull hosts from backend and dump it into hosts
file at /omd/sites/aegismk/hosts
which shows hosts in UI. We have a backend service which generates checkmk agent data which is available from a curl command and that is what we are trying to configure in main.mk by providing datasource_programs variable. Though, checkmk is not executing that curl command and defaulting to icmp-ping which is not desired behavior.
Question:
- Is this configuration at Main.mk still valid (this is the same exact file works for 1.2.6p16)? and is it the only configuration needed to get services/checks/sensor information for each host from backend?
- What else is needed to make these services available fro each host?
- Any suggestion on the most recent approach to fetch service information programmatically in checkmk node locally?
- How to replace curl base response to map services for each host vs PING service that fails with error -
check_icmp: Failed to obtain ICMP socket: Operation not permitted
Look forward to hearing from you/exprerts. Thank you in advance!
Additional Information:
As you can see I have provided configuration in main.mk
to use datasource_program which makes call to our backend service to fetch services for each host in all_hosts list.
OMD[aegismk]:~$ cat etc/check_mk/main.mk
_user = os.environ["USER"]
all_hosts += [_host.rstrip('\n') for _host in open('/omd/sites/{}/hosts'.format(_user), 'r')]
# Host currently report as always up
host_check_commands += [
( 'ok', all_hosts )
]
extra_host_conf['alias']=[]
_aliases = [_alias.rstrip('\n') for _alias in open('/omd/sites/{}/host_aliases'.format(_user), 'r')]
for _host in _aliases:
_pipe=_host.index('|')
_alias=_host[:_pipe]
_instance=_host[_pipe+1:]
extra_host_conf['alias'].append((_alias,[_instance]))
ipaddresses = dict([(_ip.split('|')[0], '127.0.0.1') for _ip in all_hosts])
#Add checks for hosts
datasource_programs += [( 'curl BASE_URL/checkmk?inst=<HOST>', all_hosts )]
check_submission = 'pipe'
Sample host configuration for host=pod807
OMD[aegismk]:~$ cmk -D pod807
pod807
Addresses: 127.0.0.1
Tags: [address_family:ip-v4-only], [agent:cmk-agent], [criticality:prod], [networking:lan], [piggyback:auto-piggyback], [pod:pod], [site:aegismk], [snmp_ds:no-snmp]
Labels: [cmk/site:aegismk]
Host groups: check_mk
Contact groups: all, check-mk-notify
Agent mode: No agent
Type of agent:
Process piggyback data from /omd/sites/aegismk/tmp/check_mk/piggyback/pod807
PING only
Services:
checktype item params description groups
--------- ---- ------ ----------- ------
Expected/Ideal output for host=pod807
OMD[aegismk]:~$ cmk -D pod807
pod807 (no DNS, no entry in ipaddresses)
Tags:
Host groups: prod, ecom
Contact groups: all, check-mk-notify
Type of agent: TCP (port: 6556)
Is aggregated: no
Services:
checktype item params description groups summarized to groups
--------- ----------------------------------------------------------------------- ------ ----------------------------------------------------------------------- ------ ------------- ------
local app.cpu.percent None app.cpu.percent
local db.cpu.percent None db.cpu.percent
local util.cpu.percent-user None util.cpu.percent-user
...
...
Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)
OMD[aegismk]:~$ cmk --debug -vvn pod807
Checkmk version 2.1.0p24
Try license usage history update.
Trying to acquire lock on /omd/sites/aegismk/var/check_mk/license_usage/next_run
Got lock on /omd/sites/aegismk/var/check_mk/license_usage/next_run
Trying to acquire lock on /omd/sites/aegismk/var/check_mk/license_usage/history.json
Got lock on /omd/sites/aegismk/var/check_mk/license_usage/history.json
Next run time has not been reached yet. Abort.
Releasing lock on /omd/sites/aegismk/var/check_mk/license_usage/history.json
Released lock on /omd/sites/aegismk/var/check_mk/license_usage/history.json
Releasing lock on /omd/sites/aegismk/var/check_mk/license_usage/next_run
Released lock on /omd/sites/aegismk/var/check_mk/license_usage/next_run
+ FETCHING DATA
Source: SourceType.HOST/FetcherType.PIGGYBACK
[cpu_tracking] Start [7fe0faff37f0]
[PiggybackFetcher] Fetch with cache settings: NoCache(pod807, base_path=/omd/sites/aegismk/tmp/check_mk/data_source_cache/piggyback, max_age=MaxAge(checking=0, discovery=120, inventory=120), disabled=True, use_outdated=False, simulation=False)
Not using cache (Cache usage disabled)
[PiggybackFetcher] Execute data source
No piggyback files for 'pod807'. Skip processing.
No piggyback files for '127.0.0.1'. Skip processing.
Not using cache (Cache usage disabled)
[cpu_tracking] Stop [7fe0faff37f0 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
+ PARSE FETCHER RESULTS
Source: SourceType.HOST/FetcherType.PIGGYBACK
No persisted sections
-> Add sections: []
Received no piggyback data
[cpu_tracking] Start [7fe0fb008fd0]
value store: synchronizing
Trying to acquire lock on /omd/sites/aegismk/tmp/check_mk/counters/pod807
Got lock on /omd/sites/aegismk/tmp/check_mk/counters/pod807
value store: loading from disk
Releasing lock on /omd/sites/aegismk/tmp/check_mk/counters/pod807
Released lock on /omd/sites/aegismk/tmp/check_mk/counters/pod807
No piggyback files for 'pod807'. Skip processing.
No piggyback files for '127.0.0.1'. Skip processing.
[cpu_tracking] Stop [7fe0fb008fd0 - Snapshot(process=posix.times_result(user=0.010000000000000009, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
execution time 0.0 sec | execution_time=0.000 user_time=0.010 system_time=0.000 children_user_time=0.000 children_system_time=0.000 cmk_time_agent=0.000
Backend Response / Curl Command output
OMD[aegismk]:~$ curl BASE_URL/checkmk?inst=pod807
<<<check_mk>>>
Version: 2.1.0p24
AgentOS: linux
AgentDirectory: /etc/check_mk
DataDirectory: /var/lib/check_mk_agent
SpoolDirectory: /var/lib/check_mk_agent/spool
PluginsDirectory: /usr/lib/check_mk_agent/plugins
LocalDirectory: /usr/lib/check_mk_agent/local
<<<local>>>
P nginx.openfiles.master.prd value=922.0;80000.0;100000.0 Check nginx.openfiles.master.prd (Open files for master process https://grafana_/dashboard/db/nginx-global-stats?var-Pod=*&var-Realm=pod807&var-Instance=pod807 ) groupByNodes(pod807.infrastructure.pesslonly.*.openFiles.master, 'maxSeries', 0) >= 100000, GM: https://gm_url
...
...
...
...