Checkmk agent and telnet show different output

CMK version: 2.0.0p6 (CRE)
OS version: Centos 7.9

We have created a local check for our Kubernetes cluster, for some reason the output of the local check is showing different output in telnet and check_mk_agent causing no error when the cluster is down.

Found a similar issue reported back in 2018 [Check_mk (english)] Difference between "check_mk_agent test" and "telnet client 6556"?

Any idea how this can be fixed ?

check mk agent output
2 k8_node_monitoring K8 cluster few nodes are not healthy. Run kubectl get nodes and check output

Telnet output
0 SPC_k8_node_monitoring K8 cluster are healthy

It looks like you run the agent manually wih different permissions than when run as service.

The Windows service for the checkmk agent runs usually as SYSTEM account. The Linux agent runs as root. Do these accounts have access to the information the local check needs?

The script has permissions to run as root (755) , would that be enough or any other config is needed.
Running on centos 7.9

Is there selinux active? It may block certain actions when the calling process is network connected.

Selinux is disabled, any other ideas. I am out of thoughts to debug this at this moment.
@r.sander Does Checkmk server get data from check_mk_agent or telnet, it looks like the details are fine when check_mk_agent and not with telnet.

When you mean check mk agent output what do you mean? running check-mk-agent or running your local check directly?
Are you running the agent as root?

How can the service have different names??
k8_node_monitoring
vs
SPC_k8_node_monitoring

Something here is not correct. did you write the script yourselves?

2 Likes

@Anders when i run the command “check_mk_agent” on the my client machine ( as root user ) where the kuberenetes is running, i get correct output for the local check.
I did create the script, adding it here

#!/bin/bash
service=SPC_k8_node_monitoring
get_procs() {
count=0
nodestatus=`kubectl get nodes | awk '{print $2}' | grep -v STATUS`
for i in $nodestatus
do
if [[ "$i" != "Ready" ]]
then
        count=$(( ${count} + 1 ))
fi
done

if [ $count != 0 ]
then
        echo "2 $service K8 cluster few nodes are not healthy. Run kubectl get nodes and check output "
else
        echo "0 $service K8 cluster are healthy"
fi
}


get_procs

I think the difference in output in earlier comment is just a typo.

Adding ouptut from check_mk_agent vs telnet

[root@XXXXXXX ~]$ telnet 0 6556|grep -i SPC_k8_node_monitoring
0 SPC_k8_node_monitoring K8 cluster are healthy

[root@XXXXXXX ~]$ check_mk_agent|grep -i SPC_k8_node_monitoring
2 SPC_k8_node_monitoring K8 cluster few nodes are not healthy. Run kubectl get nodes and check output

kubectl needs credentials to connect to K8s, correct?

Are you sure that it can read them when called from the socket 6556/tcp?

Sometimes the credentials or a pointer to them are stored in environment variables that are initialized via ~/.bashrc. This is not done when running the agent from the socket.

When the kubectl output is empty your check logic will output the OK status. You should add some code that is able to handle kubectl errors.

Interesting, I can’t see any reason why this should not work. Right now it feels like you have two different Checkmk Agents, one that is behind xinet/system.d and another one you are executing.

When you connect to 6556 your daemon will execute check_mk_agent (as the user you have specified)

As @r.sander it might be that cubectl have a access to env variables that the checkmk agent does not have, even if its running as root and are executing as bash.

1 Like

I changed my approach , ran a cron to output the results and picked up the not ready nodes from there. Appreciate the support, please consider this query to be resolved.

We actually developed a plugin that reads files (in Checkmk local check format) and create service checks in the agent just as we would run a real plugin. That way we only need read access to those files and can run the agent as whatever user without any extra permissions.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.