We have created a local check for our Kubernetes cluster, for some reason the output of the local check is showing different output in telnet and check_mk_agent causing no error when the cluster is down.
It looks like you run the agent manually wih different permissions than when run as service.
The Windows service for the checkmk agent runs usually as SYSTEM account. The Linux agent runs as root. Do these accounts have access to the information the local check needs?
Selinux is disabled, any other ideas. I am out of thoughts to debug this at this moment. @r.sander Does Checkmk server get data from check_mk_agent or telnet, it looks like the details are fine when check_mk_agent and not with telnet.
@Anders when i run the command “check_mk_agent” on the my client machine ( as root user ) where the kuberenetes is running, i get correct output for the local check.
I did create the script, adding it here
#!/bin/bash
service=SPC_k8_node_monitoring
get_procs() {
count=0
nodestatus=`kubectl get nodes | awk '{print $2}' | grep -v STATUS`
for i in $nodestatus
do
if [[ "$i" != "Ready" ]]
then
count=$(( ${count} + 1 ))
fi
done
if [ $count != 0 ]
then
echo "2 $service K8 cluster few nodes are not healthy. Run kubectl get nodes and check output "
else
echo "0 $service K8 cluster are healthy"
fi
}
get_procs
I think the difference in output in earlier comment is just a typo.
Adding ouptut from check_mk_agent vs telnet
[root@XXXXXXX ~]$ telnet 0 6556|grep -i SPC_k8_node_monitoring
0 SPC_k8_node_monitoring K8 cluster are healthy
[root@XXXXXXX ~]$ check_mk_agent|grep -i SPC_k8_node_monitoring
2 SPC_k8_node_monitoring K8 cluster few nodes are not healthy. Run kubectl get nodes and check output
kubectl needs credentials to connect to K8s, correct?
Are you sure that it can read them when called from the socket 6556/tcp?
Sometimes the credentials or a pointer to them are stored in environment variables that are initialized via ~/.bashrc. This is not done when running the agent from the socket.
When the kubectl output is empty your check logic will output the OK status. You should add some code that is able to handle kubectl errors.
Interesting, I can’t see any reason why this should not work. Right now it feels like you have two different Checkmk Agents, one that is behind xinet/system.d and another one you are executing.
When you connect to 6556 your daemon will execute check_mk_agent (as the user you have specified)
As @r.sander it might be that cubectl have a access to env variables that the checkmk agent does not have, even if its running as root and are executing as bash.
I changed my approach , ran a cron to output the results and picked up the not ready nodes from there. Appreciate the support, please consider this query to be resolved.
We actually developed a plugin that reads files (in Checkmk local check format) and create service checks in the agent just as we would run a real plugin. That way we only need read access to those files and can run the agent as whatever user without any extra permissions.
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.