Local checks in cluster

Hi,

we have 5 node server running with MapR. Node1 is running RESTAPI, Node2 is also running RESTAPI. My local check is asking REST on node1 from node1, and local check is asking REST on node2 from node2.

Then I have clustered host in Check_MK consists of real hosts Node1 and Node2 in Check_MK.

Thing is, now I have doubled the output with automated prefix on which node check is running. Is it possible to hide that prefix and output just only one text? Or is that not possible with local checks and I need to make them as MRPE? Or my scripts needs to be adapted somehow?

I would like to avoid rewriting script for MRPE usage, as I don’t know how to pass performace data with MRPE style script.

Nobody was trying local checks in cluster?
It is not possible to have output printed only once like in MRPE?
On picture “test_plugin” check is MRPE script, “testing” check is local script.
check_mk_local_cluster

Your example means the MRPE is not cluster aware and prints the first result.
The local check prints all results. That is the right behavior.

What you need is a real check what can handle such conditions as running of x nodes of a cluster and so on.

I thought that if I put the MRPE script on two servers, make a virtual cluster in Check_mk, assign a service to Clustered services for overlapping clusters and it’s done. So I was wrong, and some of our checks are running in false meaning cluster for pretty long time. That’s not good.

No idea how to do that. I can’t find any documentation or examples for that.

Hi @marbaa

As I understand the section “overlapping clusters”, here:

You would only need this if you are e.g. running a service that is running on “both nodes” and
is shared by let’s say two VIPs. If this is your scenario, then you haven’t done anything wrong
(again: if I understand aforementioned section correctly).

I realized, that for many (if not most) of my clusters, I don’t need “overlapping” ones after all, but that may differ on your end. I believe what Andreas meant though, isn’t something that you will find in the documentation.

I think he meant, that your script, will need to have some built in logic, that “understands” e.g. on
which node a particular service is “active”, so that your check output reflects the correct state
of your cluster.

Thomas

Thanks @openmindz . I read that section now. And I’m confused even more than I was before :smiley: Probably I don’t understand docu correctly and don’t understand correctly what to use for my need.

MapR is basically own NFS filesystem implementation spreaded over few identical servers. On two servers there is running RESTAPI service. Each server has own IP, so that service has also own IP. Informations about the filesystem are available either from first service, or from second. Output from them will be the same, they are running on two hosts just for redundancy.

Can you recommend what type of check should I use for that? To not have duplicate output like in first post?

In this case it would best to use the BI function to create virtuell objects and then use these objects for notification as cluster status.

I try to describe.

Create a BI rule to aggregate the REST services. The aggregation function should be “best” in this case as you said it must run 1 time but can run 2 times.
Create a BI rule for the other services where the aggregation function is something like “over 60% is ok and over 40% is warning”.
Now you have two aggregation rules. These two rules will be put inside an aggregation named “MapR” and you have an overall status. You can test with manually modifying the node status inside the aggregation if the desired overall status is right.

Here Reduce complexity with Business Intelligence you can find the complete BI documentation
If your created BI is as you want it these BI can be brought back inside the monitoring as a service with the BI special agent.

Thanks Andreas. I tried to play little bit with BI, but to checking the state of it it is needed to navigate away from host. Second thing is that, we are using Thruk for our company users (historical reasons) for fetching data from Check_MK.

I think I will stay with local checks and doubled output text as there is no easy solution (if ever possible) for my need or for what I want.

Hi @marbaa

While BI is extremely powerful, and I believe Andreas has given you a good example to use as a starting point, it is a “science unto itself”: It does have some complexity of its own to set up in a useful way. Not impossible, but as far as I understand you, you currently need something simpler.

I haven’t ever used “MapR”, so I don’t have any experience I could share, but
maybe the following approach works for you:

  • Create a cluster object, assign a Cluster IP. If a Cluster IP isn’t applicable for your cluster, re-read the note under the example in Section 2.1 of the previously mentioned article in the official documentation in order to modify this accordingly with a different “Host Check command”.

  • Assign the nodes, and the clustered services e.g. your file system that you share via NFS or whichever other services you deem critical, as described in Section 2.2.

  • Run a discovery for the cluster, and the nodes, so that the services that are “clustered” are “moved” to your cluster object, as mentioned in Section 2.3, to avoid “duplicates” as you said.

And now to your local check:

Directly at the end of Section 2.3 there is a “Tip” that may be of interest for you, which I’ll partially quote: Use the ruleset

Settings for local checks to influence the result by choosing between Worst state and Best state.

I’ve never used this ruleset, but it does sound like it could be applicable for you.

HTH,
Thomas

Hi @openmindz,

thanks for steps. I already went according them. But always I see output from both checks with preffix “On node”.

I think it is not possible to to achieve what I want with just pure local checks. Local check will always add “On node” to the output.

Probably, for single output I will need to write check plugin which will have parameter node_info: True and then write agent plugin which will send the output text, in my case the filesystem usage.

That’s the thing I’m going to try next, I will write results afterwards.

That TIP doesn’t do anything useful :worried:

Hi @marbaa

OK, now I get what you mean, and what your initially posted problem is. As far as I understand, I have to say that your “issue” is more of a “cosmetic” nature. I believe that this can easily be overcome by good internal documentation and communication regarding this particular check, but… I may be wrong. I know: people can be weird, and not be able to “tolerate” this “duplicated output”. Seriously though: Is that such a big deal? :man_facepalming:

In any case, I did some very brief research about MapR and what I found, is perhaps not really news to you, but that thing has apparently its own CLI (maprcli) one could probably (ab)use to write a check that will do what you want:

https://docs.datafabric.hpe.com/62/ReferenceGuide/maprcli-REST-API-Syntax.html

There is a myriad of commands, such as node list or dashboard info which do look useful… maybe others, too. I’m sure as someone who is working with that thing, such as yourself, you’ll have some experience with it, and will be able to find the right combination of commands to achieve what you want: It shouldn’t be too hard to do.

In summary

What you need to code into your script - whether you want to use it as local check or via MRPE - is some sort of condition, that, in case both your nodes are active, only generates the output you posted above, on one of them… which is what Andreas already said above. This shouldn’t lead to those “pesky” duplicates anymore.

I wanted to add that you will not find any documentation on how to do that, because it strictly depends on the application you want to write it for (in your case: that MapR thingie…): Without wanting to sound rude, you will need to come up with the logic. Again, the existing CLI toolbox should provide plenty of stuff to play around with.

Oh and by the way - quoting myself - the tip to use

does do something useful, but… perhaps not for your case, so I’m sorry if I misled you with that advice. With all that said, I sincerely hope that you will come up with a satisfying solution for your issue, and please do post your results: I’m sure they will provide helpful advice for similar issues others have…:slight_smile:

Thomas