Monitor a UPS Zen-X with a .txt file

openmindz · January 27, 2022, 7:58pm

First of all $OMD_ROOT corresponds - in your case - to /omd/sites/monsvrs so yes, the piggybacked data, should be there (meaning: /omd/sites/monsvrs/tmp/check_mk/piggyback the var I had above was wrong).

Your local check, should be in the “LocalDirectory”, my agent says it is /usr/lib/check_mk_agent/local and - as previously stated - must be executable.

Can you try to run your agent in debug mode and carefully check

the location it says for your LocalDirectory
the output after the section # Local checks? For me, this looks
like this (my local check is called localtest2):

# Local checks
echo '<<<local>>>'
+ echo '<<<local>>>'
<<<local>>>
if cd "$LOCALDIR"; then
    for skript in ./*; do
        if is_valid_plugin "$skript"; then
            ./"$skript"
        fi
    done
    # Call some plugins only every X'th second
    for skript in [1-9]*/*; do
        if is_valid_plugin "$skript"; then
            run_cached "local_${skript//\//\\}" "${skript%/*}" "$skript"
        fi
    done
fi
+ cd /usr/lib/check_mk_agent/local
+ for skript in './*'
+ is_valid_plugin ./localtest2
+ pattern='\.dpkg-(new|old|temp)$'
+ [[ -f ./localtest2 ]]
+ [[ -x ./localtest2 ]]
+ [[ ! ./localtest2 =~ \.dpkg-(new|old|temp)$ ]]
+ true
+ ././localtest2
<<<<otherhost>>>>
<<<local>>>
0 Status - I'm OK
0 Voltage - I'm OK, too
<<<<>>>>
--- rest skipped intentionally ---

Thomas

Obeylife10 · January 28, 2022, 8:53am

Hi @openmindz and thanks for helping me,

I tried the debug mode with check_mk_agent -d 2>&1 | less but the result is to big, I add a | grep -e LocalDirectory to fin what you asking.

For the LocalDirectory it returns me that :

echo "LocalDirectory: $LOCALDIR"
+ echo 'LocalDirectory: /usr/lib/check_mk_agent/local'
LocalDirectory: /usr/lib/check_mk_agent/local

For the local checks, It’s look like I found only this :

  # Local checks
    if cd "$LOCALDIR"; then
        if $MK_RUN_SYNC_PARTS; then
            echo '<<<local:sep(0)>>>'
            for skript in ./*; do
                if is_valid_plugin "$skript"; then
                    ./"$skript"
                fi
            done
        fi
        # Call some plugins only every X'th second
        for skript in [1-9]*/*; do
            if is_valid_plugin "$skript"; then
                run_cached "local_${skript//\//\\}" "${skript%/*}" "$skript"
            fi
        done
    fi

Should I remove the agent and reinstall it ? When I tried the @r.sander’s method (plugin) maybe it caused a conflict with local checks?

Stupid question, when I put the script in the directory, checkmk should execute it automatically ?

And as my checkmk server and agent are on the same machine, I don’t really need to use Piggyback mechanism right ?

EDIT :

If I do a grep -A 20 mylocalcheck( mylocalcheck = name of the script) added to the previous command, it returns me that :

+ is_valid_plugin ./mylocalcheck
+ pattern='\.dpkg-(new|old|temp)$'
+ [[ -f ./mylocalcheck ]]
+ [[ -x ./mylocalcheck ]]
+ [[ ! ./mylocalcheck =~ \.dpkg-(new|old|temp)$ ]]
+ true
+ ././mylocalcheck
<<<local:sep(0)>>>
<<<local>>>
0 Status - I'm OK
0 Voltage - I'm OK, too
+ for skript in [1-9]*/*
+ is_valid_plugin '[1-9]*/*'
+ pattern='\.dpkg-(new|old|temp)$'
+ [[ -f [1-9]*/* ]]
+ false
+ true
+ run_plugins
+ cd /usr/lib/check_mk_agent/plugins
+ true
+ for skript in ./*
+ is_valid_plugin './*'
+ pattern='\.dpkg-(new|old|temp)$'
+ [[ -f ./* ]]
+ false
+ for skript in [1-9]*/*
+ is_valid_plugin '[1-9]*/*'

openmindz · January 28, 2022, 10:17am

Hi @Obeylife10

No, I don’t think so, but the plugin looks for me as if it is for Checkmk 1.6 and not for 2.0. Anyway, that shouldn’t interfere with local checks.

Yes, if it’s executable. From your output I see, that it is being executed.

In theory, no. But the services your local check creates, will be associated with your “main” host. If you can live with that, and just “know” that those two services refer to your UPS, that’s fine. If you want to have your UPS as a separate host in Checkmk, I’d personally use the piggyback method.

Back to the issue: the script mylocalcheck is being executed, but as I see, the <<<<otherhost>>>> identifier is not present, so I believe that’s the reason why no piggyback data is present.

Can you try two things:

remove the line with echo <<<local>>> from mylocalcheck. Wait three minutes or so, and recheck your host in “Setup”: Theoretically, your host should discover two new services. This would “fulfill” the scenario of having services which refer to the UPS, associated with your “main host”.
add both lines with echo <<<<otherhost>>>> and the line with echo <<<local>>> to mylocalcheck. Again, wait three minutes or so. Theoretically piggyback data should exist now, your host should “lose” the two services it discovered before, and if you discover services for “otherhost” it should find them. This would result in having two hosts in Checkmk: your main host, and the piggybacked one.

I hope all my explanations are not too convoluted…

Thomas

Obeylife10 · January 28, 2022, 10:40am

I understand more theoretically now, you help me a lot.

Ok now I finally find the good syntax for the script and I had a result :

My services appears correctly but with the others services of the host like you said.

There is the script in question :

#!/bin/sh

echo "<<<local>>>"

echo "0 Status - I'm OK"
echo "0 Voltage - I'm OK too"

In settings host, I had to specify 127.0.0.1 as IP adress, and enable the agent not disabling it.

Now I will try the piggyback method

openmindz · January 28, 2022, 10:42am

Hey @Obeylife10

Cool, glad it worked!

Yes, of course, I forgot to mention that, you’re right.

Thomas

Obeylife10 · January 28, 2022, 10:57am

So if I use the piggyback method, I also have to specify an IP adress, and enable API/agent ?

openmindz · January 28, 2022, 11:00am

If you want to use/test the piggyback method, you will have two hosts in “Setup” in the end:

Your main host, with default settings (i.e. agent enabled, IP address is set) as it is now.
Your otherhost which is piggybacked, has no agent and no IP, as in my example.

Thomas

Obeylife10 · January 31, 2022, 9:16am

Hi everyone,

@openmindz I tried the method Friday and after rebooting my computer and checking it today, it’s look like it’s working well, Thank you

Now I just have to adapt a new script, to get the result I need

Is it possible to add these two services to the main dashboard ?

openmindz · January 31, 2022, 11:57am

Hi @Obeylife10

Cool, glad that it worked for you!
First of all the “main dashboard” is different in an Enterprise
Edition compared to a RAW edition, but the following applies:

If there’s a WARN or CRIT alert for your services it will show up in the “main dashboard”
If you want to have a somehow different display (e.g. you want to see those services no matter what), you’d need to modify your dashboard accordingly (or create a new one). There is a nice section about what one exactly sees on the “main dashboard”, and how to modify it.
The article about views might also be of interest to you.

HTH,
Thomas

Obeylife10 · February 16, 2022, 9:25am

Hi everyone, hi @openmindz

Sorry to reply late,

You’re right when there is a WARN or CRIT alert, they appear in the main dashboard successfully.

I don’t understand really well how the synchro with local check works, for exemple when I just modify my fake localcheck file txt, changing the value of the status service from 0 to 2 for exemple (so my status should change from “OK” to “CRIT”, sometimes changes apply immediately, sometimes it can take hours.

Anyway, now I will work on the final script which will convert my data from a txt file (named test2.txt) created by the first script (named ups.sh) to the good localcheck syntax, the first script is just a script which extract the information of the UPS and putting them on a txt file.

“ups.sh” script :

#!/bin/bash
upsc servers@localhost | grep -e ups.status -e "output.voltage" -e "input.voltage" -e "battery.voltage" -e "battery.charge" -e ups.beeper.status > /home/user1/test2.txt

I used Crontab to repeat this script each minute, it overwrites my txt file everytime.

The information extracted to the file txt look like to this :

output.voltage: 223.7

ups.delay.shutdown: 30
ups.delay.start: 180
ups.load: 6

ups.status: OB

(OB = The UPS is on Battery
OL = The UPS is Online)

How can I convert these datas to the good localcheck syntax.

I hope you understood what I meant .

Creating a new script is a good idea or there is a better method you think ?

Thanks.

mike1098 · February 16, 2022, 11:14am

Hello,

I didnt read the full thread because its quite long, so please apologize if I misunderstand something here.

For my understanding it is not necessary to run a separate cron job and the parse the txt file by a local check. This could be done all in one local check.

Just a rough sketch without any guarantee:

echo "<<<local>>>"
OUTPUT=$(upsc servers@localhost)
voltage=$(grep -e "output.voltage" <<< "${OUTPUT}" | awk -F ': ' '{print $2}' | awk -F '.' '{print $1}')

if [ ${voltage} -lt 200 ]; then
   echo  "1 ups_voltage voltage=${voltage};200;190 WARNING - Voltage is at ${voltage}"
fi

That should give you a good starting point. Extend it by the other parameters.

regards

Michael

openmindz · February 16, 2022, 11:50am

Hi @Obeylife10

What @mike1098 proposes, looks very good, and is probably a starting point for you to write a local
check, fulfilling your requirements. Please see also the official documentation about local checks, which has a lot of useful information you may want to incorporate.

HTH,
Thomas

Obeylife10 · February 16, 2022, 12:48pm

Yes, it makes more sense @mike1098, I will do this.

@openmindz Yes, I looked this page, there are several interesting options to incorporate into my script, I will test them progressively.

For the moment, I do not have the UPS in my possession. So the upsc command will not work, I will try to replace this command by fakes data like :

battery.charge: 85
battery.voltage: 12.60
input.current.nominal: 2.0
input.frequency: 0.0
input.voltage: 8.2
output.voltage: 223.7
ups.delay.shutdown: 30
ups.delay.start: 180
ups.load: 6
ups.status: OB

just for testing and i’m back to you.

Thanks for the help !

Obeylife10 · February 16, 2022, 4:20pm

For the moment I just created a txt file where I’m simulating fake datas, there what it look like :

output.voltage: 210.7
battery.charge: 85
battery.voltage: 12.60
input.current.nominal: 2.0
input.frequency: 0.0

I modified the script in question : #!/bin/bash

#!/bin/bash
#echo "<<<<UPShost>>>>"
#echo "<<<local>>>"

OUTPUT=$(cat /home/user1/5.txt)

voltage=$(grep -e "output.voltage" <<< "${OUTPUT}" | awk -F ': ' '{print $2}' | awk -F '.' '{print $1}')

if [ ${voltage} -gt 200 ]; then
   echo  "1 ups_voltage voltage=${voltage};200;190 WARNING - Voltage is at ${voltage}"
fi

And he’s working well :

I didn’t understood well the metric here :

voltage=${voltage};200;190

Does that mean if the voltage is at “190” then it will be CRITIAL instead WARNING ?

openmindz · February 16, 2022, 7:49pm

Hi @Obeylife10

Congratulations, my friend, nice! Regarding your question about metrics: the manual
provides all answers you need. Let me elaborate:

From the moment that your check output always has 1 as the first character, any check result is considered a “WARN”-state. Please see the official documentation about local checks, to understand which parts mean what for the Checkmk server: 2.1. Creating the script.
That said, yes, “190” would be the value you have set as “CRIT” threshold. Here is an explanation about what the comma separated values mean: 3.1. Metrics. Of course, this is only true, if your script is actually checking for that, meaning:
In order to make your script respect your threshold, and “act accordingly”, you need to code its logic: the agent doesn’t do that for you. So you need to have some kind of “conditional”, e.g. values between X and Y result in an “OK” state, values between Y and Z mean “WARN”, etc. Hint: There are countless tutorials on the web for “simple Nagios” checks, and you can utilize their logic for Checmk local scripts, too. Here is one I found at HowtoForge.
Alternatively, you could let the threshold be calculated dynamically, see this: 3.4. Calculating status dynamically.

HTH,
Thomas

mike1098 · February 17, 2022, 8:48am

Hello,

The two values 200;190 are the WARNING and CRITICAL values for the metric.

You have the choice:
Either you can determine the state of your check in your code with conditions (if this > that do this:), then the two values after voltage are just produce a yellow and red line in the graph
or
You can use:

echo  "P ups_voltage voltage=${voltage};200;190 WARNING - Voltage is at ${voltage}"

Then checkmk is taking this two values and determine the state of the check. Thats quite handy in this case but works not in complex situations or if you need to test text status.

As @openmindz mentioned, please read the docs

BR

Michael

Obeylife10 · February 17, 2022, 10:37am

Ok thanks I’ll try to do my best.

Thats quite handy in this case but works not in complex situations or if you need to test text status.

I can changes the values manually in the txt file, just to get an overview. Do you know guys why when I make changes in files, sometimes it doesn’t update directly in Checkmk even if I use a full services scan (particularly in Piggyback method), I’m forced to reboot the VM.

About the metrics, for exemple, the output_voltage :

I want it to returns me CRITICAL if the value is lower than 180
WARN if the value is between 180 and 195
And OK if the value is upper than 195.

From what I’ve read the best option to use is :

3.5. Upper and lower thresholds

?

There is the syntax :

metricname=value;warn_lower:warn_upper;crit_lower:crit_upper

So my metric should be like this ? :

echo "P output_voltage voltage=${voltage};195:180;180:0 WARNING - Voltage is at ${voltage}"

But if I put the value of “output_voltage” at 190 for exemple it returns me CRIT :

mike1098 · February 17, 2022, 11:14am

crit_upper=0 and 190 is above 0

mike1098 · February 17, 2022, 11:16am

Take in to account with piggyback you need to update both hosts, the host which collects the information via agent and then the host consuming the created piggyback data.

BR

Michael

Obeylife10 · February 17, 2022, 1:04pm

Yeah that’s exactly what I did, I uncommented the piggyback line for the moment, I will look this at the end.

So I have to remove the " crit_upper" ? which would give : echo "P output_voltage voltage=${voltage};195:180;180;

But it also returns me CRIT for 220 :