Local Check not being discovered in WATO

CMK version:
Checkmk Enterprise Edition 2.2.0p12
OS version:
Ubuntu 22.04.4 LTS
Error message:
None
Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)

Checkmk version 2.2.0p21
+ FETCHING DATA
  Source: SourceInfo(hostname='API-Monitoring', ipaddress='XXX.XXX.XXX.247', ident='agent', fetcher_type=<FetcherType.TCP: 8>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7fbc425b20d0]
Read from cache: AgentFileCache(API-Monitoring, path_template=/omd/sites/monitoring_intern/tmp/check_mk/cache/{hostname}, max_age=MaxAge(checking=0, discovery=90.0, inventory=90.0), simulation=False, use_only_cache=False, file_cache_mode=6)
Not using cache (Too old. Age is 8 sec, allowed is 0 sec)
[TCPFetcher] Execute data source
Connecting via TCP to XXX.XXX.XXX.247:6556 (5.0s timeout)
Detected transport protocol: TransportProtocol.TLS (b'16')
Reading data from agent via TLS socket
Reading data from agent
Detected transport protocol: TransportProtocol.PLAIN (b'<<')
Closing TCP connection to XXX.XXX.XXX.247:6556
Write data to cache file /omd/sites/monitoring_intern/tmp/check_mk/cache/API-Monitoring
Trying to acquire lock on /omd/sites/monitoring_intern/tmp/check_mk/cache/API-Monitoring
Got lock on /omd/sites/monitoring_intern/tmp/check_mk/cache/API-Monitoring
Releasing lock on /omd/sites/monitoring_intern/tmp/check_mk/cache/API-Monitoring
Released lock on /omd/sites/monitoring_intern/tmp/check_mk/cache/API-Monitoring
[cpu_tracking] Stop [7fbc425b20d0 - Snapshot(process=posix.times_result(user=0.010000000000000009, system=0.0, children_user=0.0, children_system=0.0, elapsed=4.6099999994039536))]
  Source: SourceInfo(hostname='API-Monitoring', ipaddress='XXX.XXX.XXX.247', ident='piggyback', fetcher_type=<FetcherType.PIGGYBACK: 4>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7fbc4353ea10]
Read from cache: NoCache(API-Monitoring, path_template=/dev/null, max_age=MaxAge(checking=0.0, discovery=0.0, inventory=0.0), simulation=False, use_only_cache=False, file_cache_mode=1)
[PiggybackFetcher] Execute data source
No piggyback files for 'API-Monitoring'. Skip processing.
No piggyback files for 'XXX.XXX.XXX.247'. Skip processing.
[cpu_tracking] Stop [7fbc4353ea10 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.009999997913837433))]
+ PARSE FETCHER RESULTS
<<<check_mk>>> / Transition NOOPParser -> HostSectionParser
<<<cmk_agent_ctl_status:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<checkmk_agent_plugins_lnx:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<labels:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<df_v2>>> / Transition HostSectionParser -> HostSectionParser
<<<df_v2>>> / Transition HostSectionParser -> HostSectionParser
<<<systemd_units>>> / Transition HostSectionParser -> HostSectionParser
<<<nfsmounts_v2:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<cifsmounts>>> / Transition HostSectionParser -> HostSectionParser
<<<mounts>>> / Transition HostSectionParser -> HostSectionParser
<<<ps_lnx>>> / Transition HostSectionParser -> HostSectionParser
<<<mem>>> / Transition HostSectionParser -> HostSectionParser
<<<cpu>>> / Transition HostSectionParser -> HostSectionParser
<<<uptime>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_if>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_if:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<tcp_conn_stats>>> / Transition HostSectionParser -> HostSectionParser
<<<multipath>>> / Transition HostSectionParser -> HostSectionParser
<<<diskstat>>> / Transition HostSectionParser -> HostSectionParser
<<<kernel>>> / Transition HostSectionParser -> HostSectionParser
<<<md>>> / Transition HostSectionParser -> HostSectionParser
<<<vbox_guest>>> / Transition HostSectionParser -> HostSectionParser
<<<job>>> / Transition HostSectionParser -> HostSectionParser
<<<timesyncd>>> / Transition HostSectionParser -> HostSectionParser
<<<timesyncd_ntpmessage:sep(10)>>> / Transition HostSectionParser -> HostSectionParser
<<<local:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<cmk_update_agent_status:cached(1713857796,14400):sep(0)>>> / Transition HostSectionParser -> HostSectionParser
  HostKey(hostname='API-Monitoring', source_type=<SourceType.HOST: 1>)  -> Add sections: ['check_mk', 'checkmk_agent_plugins_lnx', 'cifsmounts', 'cmk_agent_ctl_status', 'cmk_update_agent_status', 'cpu', 'df_v2', 'diskstat', 'job', 'kernel', 'labels', 'lnx_if', 'local', 'md', 'mem', 'mounts', 'multipath', 'nfsmounts_v2', 'ps_lnx', 'systemd_units', 'tcp_conn_stats', 'timesyncd', 'timesyncd_ntpmessage', 'uptime', 'vbox_guest']
  HostKey(hostname='API-Monitoring', source_type=<SourceType.HOST: 1>)  -> Add sections: []
Received no piggyback data
[cpu_tracking] Start [7fbc424b8590]
value store: synchronizing
Trying to acquire lock on /omd/sites/monitoring_intern/tmp/check_mk/counters/API-Monitoring
Got lock on /omd/sites/monitoring_intern/tmp/check_mk/counters/API-Monitoring
value store: loading from disk
Releasing lock on /omd/sites/monitoring_intern/tmp/check_mk/counters/API-Monitoring
Released lock on /omd/sites/monitoring_intern/tmp/check_mk/counters/API-Monitoring
Acronis Cloud Backup - REDACTED, Machine: REDACTED Backup Failed for Customer: REDACTED, Details: BackupFailed, Machine: REDACTED.
Acronis Cloud Backup - REDACTED, Machine: REDACTED Backup Failed for Customer: REDACTED, Details: BackupFailed, Machine: REDACTED.
CPU load             15 min load: 0.01, 15 min load per core: 0.01 (2 cores)
CPU utilization      Total CPU: 21.09%
Check_MK Agent       Version: 2.2.0p12, OS: linux, Last update: Apr 19 2024 09:29:34, Agent plugins: 1, Local checks: 3
Disk IO SUMMARY      Read: 0.00 B/s, Write: 14.2 kB/s, Latency: 1 millisecond
Filesystem /         Used: 40.01% - 7.21 GiB of 18.0 GiB, trend per 1 day 0 hours: +44.0 MiB, trend per 1 day 0 hours: +0.24%, Time left until disk full: 251 days 15 hours
Filesystem /boot     Used: 19.05% - 371 MiB of 1.90 GiB, trend per 1 day 0 hours: +16.1 KiB, trend per 1 day 0 hours: +<0.01%, Time left until disk full: 274 years 231 days
Filesystem /boot/efi Used: 0.57% - 6.07 MiB of 1.05 GiB, trend per 1 day 0 hours: +0 B, trend per 1 day 0 hours: +0%
Interface 2          [eth0], (up), MAC: 00:15:5D:79:05:0B, Speed: 1 GBit/s, In: 22.5 kB/s (0.02%), Out: 12.5 kB/s (0.01%)
Kernel Performance   Process Creations: 21.85/s, Context Switches: 682.23/s, Major Page Faults: 0.08/s, Page Swap in: 0.00/s, Page Swap Out: 0.00/s
Memory               Total virtual memory: 6.03% - 238 MiB of 3.85 GiB, 9 additional details available
Mount options of /   Mount options exactly as expected
Mount options of /boot Mount options exactly as expected
Mount options of /boot/efi Mount options exactly as expected
Number of threads    150, Usage: 1.05%
Systemd Service Summary Total: 147, Disabled: 3, Failed: 0
Systemd Socket Summary Total: 22, Disabled: 0, Failed: 0
Systemd Timesyncd Time Offset: 2 milliseconds, Time since last sync: 28 minutes 39 seconds, Time since last NTPMessage: 28 minutes 39 seconds, Stratum: 2.00, Jitter: 2 milliseconds, Synchronized on 185.125.190.57
TCP Connections      Established: 2
Uptime               Up since Apr 19 2024 09:22:39, Uptime: 4 days 0 hours
No piggyback files for 'API-Monitoring'. Skip processing.
No piggyback files for 'XXX.XXX.XXX.247'. Skip processing.
[cpu_tracking] Stop [7fbc424b8590 - Snapshot(process=posix.times_result(user=0.010000000000000009, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.010000001639127731))]
[agent] Success, [piggyback] Success (but no data found for this host), execution time 4.6 sec | execution_time=4.630 user_time=0.020 system_time=0.000 children_user_time=0.000 children_system_time=0.000 cmk_time_agent=4.610

Im currently having the problem that my new local check written in python is being executed by check_mk_agent but is not being discovered by WATO . I’ve already written a local check which creates the above “Acronis Cloud Backup” Services. Running the script from the /usr/lib/check_mk_agent/local/ directory as root seems to be working fine and the output also looks good:

root@srv11snexechost:/usr/lib/check_mk_agent/local# ls -lisa
total 28
934705 4 drwxrwxrwx 3 root root 4096 Apr 23 14:40 .
934704 4 drwxr-xr-x 4 root root 4096 Apr 11 10:23 ..
919010 4 -rwxr-x--x 1 root root 2323 Apr 19 09:27 acronis_failed_backups.py
919004 4 drwxrwxrwx 2 root root 4096 Apr 23 14:40 dfe-configs
938787 4 -rwxr-x--x 1 root root 2359 Apr 19 07:52 encrypt.py
938788 8 -rwxr-x--x 1 root root 4280 Apr 23 14:38 ms365_dfe_incidents.py
root@srv11snexechost:/usr/lib/check_mk_agent/local# ./ms365_dfe_incidents.py
2 "DFE Incident 16 - REDACTED" - Multi-stage incident on one endpoint, Severity: medium
2 "DFE Incident 17 - REDACTED" - 'ProductKey' hacktool was prevented on one endpoint, Severity: low
2 "DFE Incident 14 - REDACTED" - 'ProductKey' hacktool was prevented on one endpoint, Severity: low
2 "DFE Incident 10 - REDACTED" - 'ProductKey' hacktool was prevented on one endpoint, Severity: low
2 "DFE Incident 9 - REDACTED" - Horizontal port scan initiated on one endpoint, Severity: low
2 "DFE Incident 7 - REDACTED" - 'ProductKey' hacktool was prevented on one endpoint, Severity: low
2 "DFE Incident 6 - REDACTED" - Horizontal port scan initiated on one endpoint, Severity: low
1 "DFE Incident 18 - REDACTED" - Horizontal port scan initiated on one endpoint, Severity: low
1 "DFE Incident 13 - REDACTED" - 'ProductKey' hacktool was prevented on one endpoint, Severity: low
1 "DFE Incident 11 - REDACTED" - 'ProductKey' hacktool was prevented on one endpoint, Severity: low
1 "DFE Incident 8 - REDACTED" - Horizontal port scan initiated on one endpoint, Severity: low
1 "DFE Incident 5 - REDACTED" - 'ProductKey' hacktool was prevented on one endpoint, Severity: low
1 "DFE Incident 15 - REDACTED" - Defense evasion incident on one endpoint, Severity: informational
root@srv11snexechost:/usr/lib/check_mk_agent/local#

This is the output of running check_mk_agent as root on the host (only the <<< local >>> part):

<<<local:sep(0)>>>
2 "Acronis Cloud Backup - REDACTED, Machine: REDACTED" - Backup Failed for Customer: REDACTED, Details: BackupFailed, Machine: REDACTED.
2 "Acronis Cloud Backup - REDACTED, Machine: REDACTED" - Backup Failed for Customer: REDACTED, Details: BackupFailed, Machine: REDACTED.
2 "DFE Incident 16 - REDACTED" - Multi-stage incident on one endpoint, Severity: medium
2 "DFE Incident 17 - REDACTED" - 'ProductKey' hacktool was prevented on one endpoint, Severity: low
2 "DFE Incident 14 - REDACTED" - 'ProductKey' hacktool was prevented on one endpoint, Severity: low
2 "DFE Incident 10 - REDACTED" - 'ProductKey' hacktool was prevented on one endpoint, Severity: low
2 "DFE Incident 9 - REDACTED" - Horizontal port scan initiated on one endpoint, Severity: low
2 "DFE Incident 7 - REDACTED" - 'ProductKey' hacktool was prevented on one endpoint, Severity: low
2 "DFE Incident 6 - REDACTED" - Horizontal port scan initiated on one endpoint, Severity: low
1 "DFE Incident 18 - REDACTED" - Horizontal port scan initiated on one endpoint, Severity: low
1 "DFE Incident 13 - REDACTED" - 'ProductKey' hacktool was prevented on one endpoint, Severity: low
1 "DFE Incident 11 - REDACTED" - 'ProductKey' hacktool was prevented on one endpoint, Severity: low
1 "DFE Incident 8 - REDACTED" - Horizontal port scan initiated on one endpoint, Severity: low
1 "DFE Incident 5 - REDACTED" - 'ProductKey' hacktool was prevented on one endpoint, Severity: low
1 "DFE Incident 15 - REDACTED" - Defense evasion incident on one endpoint, Severity: informational
<<<cmk_update_agent_status:sep(0):cached(1713933675,14400)>>>
{"last_check": 1713933675.3045506, "last_update": 1713876004.014543, "aghash": "c64ee72fa0abd2f0", "pending_hash": null, "update_url": "https://REDACTED/main/check_mk", "trusted_certs": {"0": {"corrupt": false, "not_after": "20250615132624Z", "signature_algorithm": "sha512WithRSAEncryption", "common_name": "main_key"}}, "error": null}

Any Idea what this could be?

If it matters the script is being deployed via the Agent-Bakery and the “Deploy custom file with agent” Rule-Set.

Regards
Kean

Check for differences in the environment between that the agent is running and that of your shell.

I’m sorry but i dont quite understand what i’m supposed to be checking? Are we talking about environment variables? Because my script indeed uses environment variables to de-crypt some information for execution. But seeing that the root user and my “normal” user both can access and run the script fine and the output stays the same i don’t really think that that is the issue. Or am i just misunderstanding you and supposed to be checking something else?

I’m sorry im still pretty new to CheckMK/Linux so excuse any misinformation/-interpretation.

Regards
Kean

Exactly. Use a script like this local check to dump the environment from the agent execution, then run your script from a shell where you explicitely set the same environment.

#!/bin/bash
env > /tmp/agent_env.txt
echo "0 \"My 1st service\" - This static service is always OK"

Okay, so analyzing the agent_env.txt file i can see that my environment variable is indeed missing.
i’ve previously set the environment-variable on the host via editing the /etc/environment file and applying via source /etc/environment.

i always thought that the root user executes check_mk_agent and the cmk server just pulls that info via Port 6556 but i guess theres a lot more to it than that.

How would i explicitly set the environment so that checkmk can use it aswell?

Regards
Kean

The Checkmk agent defines its rather stripped down environment in the systemd unit file. So for running ruby and python scripts, I prefer keeping these scripts outside the Checkmk agent folders and using a shell script as a wrapper that defines the environment or runs something with su (for example when a certain user and their pyenv/rbenv are needed).

you need to build your required environment inside your script. You should for example being able to do a “source /etc/environment” inside your script. This should add the environment variables you need.

okay so i’ve created a wrapper script in bash which applys the variables from /etc/environment and then runs the two scripts i need:

root@srv11snexechost:/usr/lib/check_mk_agent/local# cat ms365_dfe_incidents.sh
#!/bin/bash

# Source /etc/environment
source /etc/environment

env > /tmp/agentenv2.txt

# Run Python script to encrypt
python3 /api-monitoring/encrypt.py /api-monitoring/dfe-configs

# Run Python script for MS365 DFE incidents
python3 /api-monitoring/ms365_dfe_incidents.py /api-monitoring/dfe-configs
root@srv11snexechost:/usr/lib/check_mk_agent/local#

and in the /tmp/agentenv2.txt i can now see my environment variable. unfortunatly the services are still not discovered in the WATO but they are still under the <<< local >>> tag when running check_mk_agent

Regards
Kean

Environment variables have to be exported to be usable in subprocesses. Since env is a builtin, it shares the environment with the executing shell.

I suggest exporting only the variables needed to only the process that needs it:

#!/bin/bash

# Run Python script to encrypt
MYVAR=somevalue python3 /api-monitoring/encrypt.py /api-monitoring/dfe-configs

# Also log the return value
echo $? > /tmp/mycheck.state
1 Like

GOT IT

i fixed it by adding the line:

export VAR_NAME

this way i dont have to set the variable in the .sh script and can just pull it straight from /etc/environment to the shell-script and now the checks are discovered in WATO.

Thank you very much!

Regards
Kean

2 Likes

Thanks for the feedback. We will add the environment to the troubleshooting section of the user guide article on local checks. This is a very common pitfall.

2 Likes