Custom script correctly returns WARN in Bash but always OK in WATO

CMK version:
OS version: RHEL 9.x

Error message:

Output of “cmk --debug -vvn hostname”: (attached)

The following script is /usr/lib/checkmk_mk_agent/local/check_wd_ondemand.sh . It is owned by root:root but with execute permissions for all. If I run the script manually at the bash prompt, it returns the expected output/status when there are invalid connections, but in CheckMK WATO it is always OK/valid? I have tried a full service re-discovery and still same OK status. The script is deployed on multiple systems and all are behaving the same way.

#!/usr/bin/bash

shopt -s nocasematch

STATUS=$(sudo /usr/sap/hostctrl/exe/sapcontrol -nr 00 -function WebDispGetServerList | awk -F, ‘{ print $6 }’ | grep -v status)

if echo “$STATUS” | grep -qFi “Not” ; then
echo “1 "Web Dispatcher" - Invalid connection(s)”
else
echo “0 "Web Dispatcher" - Valid connection(s)”
fi

value store: loading from disk
Checkmk version 2.4.0p12
+ FETCHING DATA
  Source: SourceInfo(hostname='host1a2.example.com', ipaddress='10.145.32.6', ident='agent', fetcher_type=<FetcherType.TCP: 8>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7f46507c3140]
Read from cache: AgentFileCache(path_template=/omd/sites/simba1/tmp/check_mk/cache/host1a2.example.com, max_age=MaxAge(checking=0, discovery=90.0, inventory=90.0), simulation=False, use_only_cache=False, file_cache_mode=6)
Not using cache (Too old. Age is 41 sec, allowed is 0 sec)
Connecting via TCP to 10.145.32.6:6556 (5.0s timeout)
Detected transport protocol: TransportProtocol.PLAIN
Reading data from agent
Closing TCP connection to 10.145.32.6:6556
Write data to cache file /omd/sites/simba1/tmp/check_mk/cache/host1a2.example.com
Trying to acquire lock on /omd/sites/simba1/tmp/check_mk/cache/host1a2.example.com
Got lock on /omd/sites/simba1/tmp/check_mk/cache/host1a2.example.com
Releasing lock on /omd/sites/simba1/tmp/check_mk/cache/host1a2.example.com
Released lock on /omd/sites/simba1/tmp/check_mk/cache/host1a2.example.com
[cpu_tracking] Stop [7f46507c3140 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=1.0099999997764826))]
  Source: SourceInfo(hostname='host1a2.example.com', ipaddress='10.145.32.6', ident='piggyback', fetcher_type=<FetcherType.PIGGYBACK: 4>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7f4650ad3920]
Read from cache: NoCache(path_template=/dev/null, max_age=MaxAge(checking=0.0, discovery=0.0, inventory=0.0), simulation=False, use_only_cache=False, file_cache_mode=1)
0 piggyback files for 'host1a2.example.com'.
0 piggyback files for '10.145.32.6'.
Get piggybacked data
[cpu_tracking] Stop [7f4650ad3920 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
[cpu_tracking] Start [7f46507af9b0]
+ PARSE FETCHER RESULTS
<<<check_mk>>> / Transition NOOPParser -> HostSectionParser
<<<cmk_agent_ctl_status:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<checkmk_agent_plugins_lnx:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<labels:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<df_v2>>> / Transition HostSectionParser -> HostSectionParser
<<<df_v2>>> / Transition HostSectionParser -> HostSectionParser
<<<systemd_units>>> / Transition HostSectionParser -> HostSectionParser
<<<nfsmounts_v2:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
<<<cifsmounts>>> / Transition HostSectionParser -> HostSectionParser
<<<mounts>>> / Transition HostSectionParser -> HostSectionParser
<<<ps_lnx>>> / Transition HostSectionParser -> HostSectionParser
<<<mem>>> / Transition HostSectionParser -> HostSectionParser
<<<cpu>>> / Transition HostSectionParser -> HostSectionParser
<<<uptime>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_if>>> / Transition HostSectionParser -> HostSectionParser
<<<lnx_if:sep(58)>>> / Transition HostSectionParser -> HostSectionParser
<<<tcp_conn_stats>>> / Transition HostSectionParser -> HostSectionParser
<<<multipath>>> / Transition HostSectionParser -> HostSectionParser
<<<diskstat>>> / Transition HostSectionParser -> HostSectionParser
<<<kernel>>> / Transition HostSectionParser -> HostSectionParser
<<<md>>> / Transition HostSectionParser -> HostSectionParser
<<<vbox_guest>>> / Transition HostSectionParser -> HostSectionParser
<<<job>>> / Transition HostSectionParser -> HostSectionParser
<<<chrony:cached(1776883403,120)>>> / Transition HostSectionParser -> HostSectionParser
<<<local:sep(0)>>> / Transition HostSectionParser -> HostSectionParser
  HostKey(hostname='host1a2.example.com', source_type=<SourceType.HOST: 1>)  -> Add sections: ['check_mk', 'checkmk_agent_plugins_lnx', 'chrony', 'cifsmounts', 'cmk_agent_ctl_status', 'cpu', 'df_v2', 'diskstat', 'job', 'kernel', 'labels', 'lnx_if', 'local', 'md', 'mem', 'mounts', 'multipath', 'nfsmounts_v2', 'ps_lnx', 'systemd_units', 'tcp_conn_stats', 'uptime', 'vbox_guest']
  HostKey(hostname='host1a2.example.com', source_type=<SourceType.HOST: 1>)  -> Add sections: []
Received no piggyback data
CPU load             15 min load: 0.13, 15 min load per core: 0.02 (8 cores)
CPU utilization      Total CPU: 1.59%
Check_MK Agent       Version: 2.3.0p34, OS: linux, TLS is not activated on monitored host (see details), Agent plug-ins: 0, Local checks: 2
Disk IO SUMMARY      Read: 0.00 B/s, Write: 33.1 kB/s, Latency: 436 microseconds
Filesystem /         Used: 9.02% - 2.89 GiB of 32.0 GiB, trend per 1 day 0 hours: +19.5 MiB, trend per 1 day 0 hours: +0.06%, Time left until disk full: 4 years 66 days
Filesystem /boot     Used: 37.60% - 186 MiB of 495 MiB, trend per 1 day 0 hours: +0 B, trend per 1 day 0 hours: +<0.01%
Filesystem /boot/efi Used: 1.18% - 5.83 MiB of 495 MiB, trend per 1 day 0 hours: +0 B, trend per 1 day 0 hours: +0%
Filesystem /home     Used: 61.03% - 619 MiB of 1014 MiB, trend per 1 day 0 hours: +29 B, trend per 1 day 0 hours: +<0.01%
Filesystem /mnt      Used: 5.13% - 3.22 GiB of 62.7 GiB, trend per 1 day 0 hours: +0 B, trend per 1 day 0 hours: +0%
Filesystem /tmp      Used: 1.27% - 78.2 MiB of 5.99 GiB, trend per 1 day 0 hours: -3.00 KiB, trend per 1 day 0 hours: -0.00%
Filesystem /usr      Used: 37.11% - 3.71 GiB of 9.99 GiB, trend per 1 day 0 hours: +2.36 MiB, trend per 1 day 0 hours: +0.02%, Time left until disk full: 7 years 168 days
Filesystem /usr/sap  Used: 1.88% - 2.41 GiB of 128 GiB, trend per 1 day 0 hours: +13.9 MiB, trend per 1 day 0 hours: +0.01%
Filesystem /var      Used: 38.07% - 3.04 GiB of 7.99 GiB, trend per 1 day 0 hours: -3.84 MiB, trend per 1 day 0 hours: -0.05%
Kernel Performance   Process Creations: 6.51/s, Context Switches: 1779.40/s, Major Page Faults: 0.00/s, Page Swap in: 0.00/s, Page Swap Out: 0.00/s
Memory               Total virtual memory: 13.66% - 4.25 GiB of 31.1 GiB, 8 additional details available
Mount options of /   Mount options exactly as expected
Mount options of /boot Mount options exactly as expected
Mount options of /boot/efi Mount options exactly as expected
Mount options of /home Mount options exactly as expected
Mount options of /mnt Mount options exactly as expected
Mount options of /tmp Mount options exactly as expected
Mount options of /usr Mount options exactly as expected
Mount options of /usr/sap Mount options exactly as expected
Mount options of /var Mount options exactly as expected
Number of threads    582, Usage: 0.23%
SAP System Certificate Certificate expires Thu   Mar   11   18:59:59   2027
Systemd Service Summary Total: 174, Disabled: 14, Failed: 0
Systemd Socket Summary Total: 18, Disabled: 3, Failed: 0
TCP Connections      Established: 496
Uptime               Up since 2026-04-13 17:20:20, Uptime: 8 days 21 hours
Web Dispatcher       Valid connection(s)
0 piggyback files for 'host1a2.example.com'.
[cpu_tracking] Stop [7f46507af9b0 - Snapshot(process=posix.times_result(user=0.040000000000000036, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.03000000026077032))]
[agent] Success, [piggyback] Success (but no data found for this host), execution time 1.0 sec | execution_time=1.040 user_time=0.040 system_time=0.000 children_user_time=0.000 children_system_time=0.000 cmk_time_agent=1.010

Perhaps correcting the quotation marks will solve the problem:

echo "1 \"Web Dispatcher\" - Invalid connection(s)"
echo "0 \"Web Dispatcher\" - Valid connection(s)"
2 Likes

At first, I thought the same. But then the “good case” would fail as well (with a syntax error) but we can see in the output that it looks okay:

...
Web Dispatcher       Valid connection(s)
...

I think the quoting only looks wrong in the post because it is not formatted as code.

I suspect the sudo call or the call to sapcontrol. Maybe they behave differently if they cannot write to a terminal (which is the case when the checkmk agent calls the plugin in background). Or maybe they don’t have a proper environment set when run by the agent.

Then $STATUS is likely empty, the if clause fails and we see the “good case”.

I would try to redirect stderr to a logfile from whithin the agent plugin and then check for errors in the logfile. Like so:

#!/usr/bin/bash

LOGFILE=/tmp/err.log

exec {FD}>&2;       # Link file descriptor FD with stderr, i.e. save stderr in $FD
exec 2> $LOGFILE    # Replace stderr with file $LOGFILE.

shopt -s nocasematch

STATUS=$(sudo /usr/sap/hostctrl/exe/sapcontrol -nr 00 -function WebDispGetServerList | awk -F, '{ print $6 }' | grep -v status)

# log the status variable:
echo "STATUS='$STATUS'" >&2

if echo "$STATUS" | grep -qFi "Not" ; then
    echo "1 \"Web Dispatcher\" - Invalid connection(s)"
else
    echo "0 \"Web Dispatcher\" - Valid connection(s)"
fi

exec 2>&$FD {FD}>&-;    # Restore stderr and close file descriptor FD.

I would expect some interesting information in /tmp/err.log.

1 Like

You are right. The err.log file contains the two lines:
sudo: sorry, you must have a tty to run sudo
STATUS=‘’

Changed the line to the following which appears to work but will test a little more and report back. Thanks all:
STATUS=$(script -q -c “sudo /usr/sap/hostctrl/exe/sapcontrol -nr 00 -function WebDispGetServerList | awk -F, ‘{ print $6 }’ | grep -v status | grep -v ^$”)

Thanks all, the change I mentioned in my previous update fixed the issue.

1 Like