Local check "Item not found in monitoring data" when result not OK

CMK version: Checkmk Raw Edition 2.2.0p17
OS version: : RHEL 9.4

Hello,

I’m quiet new to the whole checkmk environment.

I have a small local check on some servers : /lib/check_mk_agent/local/300/check_radius_auth

> #!/bin/bash
> 
> source /etc/check_mk/check_radius_auth.cfg
> 
> if [[ -z $PORT ]] ; then
>     PORT=1812
> fi
> 
> SERVICE_NAME="Radius Auth"
> 
> if [[ -z $PASSWORD ]] ; then
>     CHECK=$(echo "User-Name=$USER" | radclient -r 1 $HOST:$PORT auth $SECRET 2>&1)
> else
>     CHECK=$(echo "User-Name=$USER,User-Password=$PASSWORD" | radclient -r 1 $HOST:$PORT auth $SECRET 2>&1)
> fi
> 
> if [[ $? -eq 0 ]] ; then
>     if ( echo $CHECK | grep -q "Received Access-Accept" ) ; then
>         echo "0 \"$SERVICE_NAME\" - OK (Received Access-Accept)"
>     else
>         echo "0 \"$SERVICE_NAME\" - OK (But maybe not ?)"
>     fi
>     exit 0
> else
>     if ( echo $CHECK | grep -q "Shared secret is incorrect" ) ; then
>         echo "1 \"$SERVICE_NAME\" - ERROR (Shared secret is incorrect)"
>     elif ( echo $CHECK | grep -q "Expected Access-Accept got Access-Reject" ) ; then
>         echo "1 \"$SERVICE_NAME\" - ERROR (got Access-Reject)"
>     else
>         echo "1 \"$SERVICE_NAME\" - ERROR (Unknown)"
>     fi
>     exit 1
> fi

It basically check if a radius connection is accepted or not. The script is working as expected for all kinds of situations.

When the connection is OK it’s also OK in checkmk webui :

# /lib/check_mk_agent/local/300/check_radius_auth
0 "Radius Auth" - OK (Received Access-Accept)

When the connection is rejected, the check goes unknown with “Item not found in monitoring data” in the summary instead of going in Warning state.

# /lib/check_mk_agent/local/300/check_radius_auth 
1 "Radius Auth" - ERROR (got Access-Reject)

When the connection is OK again, the check come back up as before

What did I do wrong ?

Regards,

Johan

1 Like

Hi Johan and welcome to the forum!

That’s a weird behaviour of checkmk and I don’t know if they made this intentionally. The culprit is the exit 1 in the bad case.

The thing is: If the local check is located in /usr/lib/check_mk_agent/local/ and runs synchronously every minute, then the returncode of the local check doesn’t matter. It is ignored.

BUT if the file is located in one of the number subdirectories (like /usr/lib/check_mk_agent/local/300) then the check is run in background (every five minutes) and if then the returncode is != 0, then its output is simply discarded.

So just change exit 1 to exit 0 in all cases.
Except maybe for some fatal error if the script cannot run at all, like wrong credentials or access rights or something like that. But be aware that it will then behave differently if run every minute vs. every N minutes.

2 Likes

Hello Dirk,

Thank you very much ! It was so simple.

It’s true that in the documentation they don’t mention exit codes, that’s why :slight_smile:

Thanks again !

Regards,

Johan

1 Like

Can you please mark the answer as the solution if it solved your problem?

The exit code for the script should be 0.

The script did run, and generated it’s report.

It should only return a non-zero exit code, if the script didn’t complete, for some catastrophic reason.

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.