Local checks in cluster

marbaa · February 10, 2021, 1:08pm

Hi,

we have 5 node server running with MapR. Node1 is running RESTAPI, Node2 is also running RESTAPI. My local check is asking REST on node1 from node1, and local check is asking REST on node2 from node2.

Then I have clustered host in Check_MK consists of real hosts Node1 and Node2 in Check_MK.

Thing is, now I have doubled the output with automated prefix on which node check is running. Is it possible to hide that prefix and output just only one text? Or is that not possible with local checks and I need to make them as MRPE? Or my scripts needs to be adapted somehow?

marbaa · February 11, 2021, 8:17am

I would like to avoid rewriting script for MRPE usage, as I don’t know how to pass performace data with MRPE style script.

marbaa · February 17, 2021, 6:59am

Nobody was trying local checks in cluster?
It is not possible to have output printed only once like in MRPE?
On picture “test_plugin” check is MRPE script, “testing” check is local script.
check_mk_local_cluster

andreas-doehler · February 17, 2021, 11:42am

Your example means the MRPE is not cluster aware and prints the first result.
The local check prints all results. That is the right behavior.

What you need is a real check what can handle such conditions as running of x nodes of a cluster and so on.

marbaa · February 18, 2021, 9:40am

I thought that if I put the MRPE script on two servers, make a virtual cluster in Check_mk, assign a service to Clustered services for overlapping clusters and it’s done. So I was wrong, and some of our checks are running in false meaning cluster for pretty long time. That’s not good.

No idea how to do that. I can’t find any documentation or examples for that.

openmindz · February 18, 2021, 9:57am

Hi @marbaa

As I understand the section “overlapping clusters”, here:

You would only need this if you are e.g. running a service that is running on “both nodes” and
is shared by let’s say two VIPs. If this is your scenario, then you haven’t done anything wrong
(again: if I understand aforementioned section correctly).

I realized, that for many (if not most) of my clusters, I don’t need “overlapping” ones after all, but that may differ on your end. I believe what Andreas meant though, isn’t something that you will find in the documentation.

I think he meant, that your script, will need to have some built in logic, that “understands” e.g. on
which node a particular service is “active”, so that your check output reflects the correct state
of your cluster.

Thomas

marbaa · February 18, 2021, 10:21am

Thanks @openmindz . I read that section now. And I’m confused even more than I was before Probably I don’t understand docu correctly and don’t understand correctly what to use for my need.

MapR is basically own NFS filesystem implementation spreaded over few identical servers. On two servers there is running RESTAPI service. Each server has own IP, so that service has also own IP. Informations about the filesystem are available either from first service, or from second. Output from them will be the same, they are running on two hosts just for redundancy.

Can you recommend what type of check should I use for that? To not have duplicate output like in first post?

andreas-doehler · February 18, 2021, 10:40am

In this case it would best to use the BI function to create virtuell objects and then use these objects for notification as cluster status.

I try to describe.

Create a BI rule to aggregate the REST services. The aggregation function should be “best” in this case as you said it must run 1 time but can run 2 times.
Create a BI rule for the other services where the aggregation function is something like “over 60% is ok and over 40% is warning”.
Now you have two aggregation rules. These two rules will be put inside an aggregation named “MapR” and you have an overall status. You can test with manually modifying the node status inside the aggregation if the desired overall status is right.

Here Reduce complexity with Business Intelligence you can find the complete BI documentation
If your created BI is as you want it these BI can be brought back inside the monitoring as a service with the BI special agent.

marbaa · February 19, 2021, 12:41pm

Thanks Andreas. I tried to play little bit with BI, but to checking the state of it it is needed to navigate away from host. Second thing is that, we are using Thruk for our company users (historical reasons) for fetching data from Check_MK.

I think I will stay with local checks and doubled output text as there is no easy solution (if ever possible) for my need or for what I want.

openmindz · February 21, 2021, 7:29pm

Hi @marbaa

While BI is extremely powerful, and I believe Andreas has given you a good example to use as a starting point, it is a “science unto itself”: It does have some complexity of its own to set up in a useful way. Not impossible, but as far as I understand you, you currently need something simpler.

I haven’t ever used “MapR”, so I don’t have any experience I could share, but
maybe the following approach works for you:

Create a cluster object, assign a Cluster IP. If a Cluster IP isn’t applicable for your cluster, re-read the note under the example in Section 2.1 of the previously mentioned article in the official documentation in order to modify this accordingly with a different “Host Check command”.
Assign the nodes, and the clustered services e.g. your file system that you share via NFS or whichever other services you deem critical, as described in Section 2.2.
Run a discovery for the cluster, and the nodes, so that the services that are “clustered” are “moved” to your cluster object, as mentioned in Section 2.3, to avoid “duplicates” as you said.

And now to your local check:

Directly at the end of Section 2.3 there is a “Tip” that may be of interest for you, which I’ll partially quote: Use the ruleset

Settings for local checks to influence the result by choosing between Worst state and Best state.

I’ve never used this ruleset, but it does sound like it could be applicable for you.

HTH,
Thomas

marbaa · February 22, 2021, 8:05am

Hi @openmindz,

thanks for steps. I already went according them. But always I see output from both checks with preffix “On node”.

I think it is not possible to to achieve what I want with just pure local checks. Local check will always add “On node” to the output.

Probably, for single output I will need to write check plugin which will have parameter node_info: True and then write agent plugin which will send the output text, in my case the filesystem usage.

That’s the thing I’m going to try next, I will write results afterwards.

That TIP doesn’t do anything useful

openmindz · February 28, 2021, 7:17pm

Hi @marbaa

OK, now I get what you mean, and what your initially posted problem is. As far as I understand, I have to say that your “issue” is more of a “cosmetic” nature. I believe that this can easily be overcome by good internal documentation and communication regarding this particular check, but… I may be wrong. I know: people can be weird, and not be able to “tolerate” this “duplicated output”. Seriously though: Is that such a big deal?

In any case, I did some very brief research about MapR and what I found, is perhaps not really news to you, but that thing has apparently its own CLI (maprcli) one could probably (ab)use to write a check that will do what you want:

https://docs.datafabric.hpe.com/62/ReferenceGuide/maprcli-REST-API-Syntax.html

There is a myriad of commands, such as node list or dashboard info which do look useful… maybe others, too. I’m sure as someone who is working with that thing, such as yourself, you’ll have some experience with it, and will be able to find the right combination of commands to achieve what you want: It shouldn’t be too hard to do.

In summary

What you need to code into your script - whether you want to use it as local check or via MRPE - is some sort of condition, that, in case both your nodes are active, only generates the output you posted above, on one of them… which is what Andreas already said above. This shouldn’t lead to those “pesky” duplicates anymore.

I wanted to add that you will not find any documentation on how to do that, because it strictly depends on the application you want to write it for (in your case: that MapR thingie…): Without wanting to sound rude, you will need to come up with the logic. Again, the existing CLI toolbox should provide plenty of stuff to play around with.

Oh and by the way - quoting myself - the tip to use

does do something useful, but… perhaps not for your case, so I’m sorry if I misled you with that advice. With all that said, I sincerely hope that you will come up with a satisfying solution for your issue, and please do post your results: I’m sure they will provide helpful advice for similar issues others have…

Thomas

marbaa · March 18, 2021, 10:13am

Sorry, took me a time to have a look on this my issue, super busy with work.

@openmindz Yup, I know about maprcli, actually getting info is faster and easier with python and REST, than using maprcli in bash, or in python.
But that is not point, doesn’t matter how I get data. The matter is how I output them to the Check_mk portal.

I was thinking about that logic in my plugins you and Andreas said. I’m, trying, but can’t understand how that logic would help.

Local check/any check must produce output, if not, then Check_mk will vanish it (in case that script will not have execute rights), or make it UNKN. Do you agree?

So if I make some comparison of output from both scripts and make final echo, that will not help, because local check must produce output.

I don’t know. I think I will give up on this, to not waste your time anymore.

openmindz · March 25, 2021, 6:12pm

Hi @marbaa

Same here my friend: So much to do, so little time. Despite the fact that I’m more at home than ever, the time I really have for myself, or for stuff I like (e.g. CMK) is less than ever before… I could imagine that this isn’t news to you… and I’m also sure that this will be true for many people all over the world, especially us “IT Monkeys”… Anyway, I digress…

Yes, that’s correct. Here’s an idea for a different approach:

maprcli - but I’m sure whichever method you employ, can be used just as well - has this subcommand node list. As far as I can see in the documentation, this returns the total number of nodes, an “id” and also the “health” for each node. Valuable information to build upon…
You could run a node list (or similar), on every node and then e.g. determine whether all of them are healthy. After you have verified that, you could write up a condition, which executes your check only on the first healthy node. On the rest of the nodes, you would then only output something like: “Node healthy, actual check result gathered on node <NODENAME>”, or something like that.

So an “ASCII mock-up” of your screenshot in your initial post, would then look something like this:

mapr_fs_space - On node NODE1: 62,80% space used etc. On node NODE2: Node healthy, please check output of NODE1.

How does that sound to you?

Take care,
Thomas

marbaa · March 31, 2021, 8:45am

Hi @openmindz,

thanks for another possible solution. But you don’t have to go through MapR documentation how to collect data from MapR the way how they are collected is not relevenat.

The behaviour of Check_MK itself and how it displays output when service is clustered is important. I can’t tell for sure, but I think when we were using Check_MK 1.4 at company, there was not shown text “On node” in local checks. Maybe it was introduced in 1.6 version.

Yesterday I spin up my virtual test environemt and installed Check_MK 2.0 and run test script there, just out of curiosity, and you know what?

It shows the output as I wish, there is no automatically added text “On node” to the output. Both with check_mk_agent from 1.6.

This is 1.6

And this is 2.0

Output in 2.0 si taken from host which is written as first in Cluster settings. When I put at first place ‘client2’, then output is shown from 'client2;

Removed execute permission from test script on ‘client1’:
1.6

2.0

What is new in 2.0 that, it shows output from both nodes in Check_MK service. I will show both output even if I specify Check_MK Service as clustered service.

When I find time, maybe I will try to test it on Check_MK 1.4.

openmindz · March 31, 2021, 9:34am

Hi @marbaa

Cool, thanks for sharing, that is indeed very interesting.

system · March 31, 2022, 9:35am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact @fayepal if you think this should be re-opened.