Aggregation options for clustering services

gera83 · August 27, 2023, 1:18am

CMK version: 2.1.0p32

I cannot get it to work with Filesystems.

I’ve tried just “Clustered services” first, which you already now what happens:

Filesystems appear perfectly, but with a “UNKNOWN” state, and that weird message about “native cluster is not supported for this kind of service, use aggregations”.

Well, i’ve tried aggregations, with Native, failover, best, worst.
All of them. No filesystem appears.

With 1.2.6, 1.4.0, it was so easy.
Guys, please, what are the exact steps to get it to work?

Thank you!!

gera83 · August 28, 2023, 12:27pm

gera83 · August 29, 2023, 4:37am

Same in 2.2.0p8. I have played with all the options.

Is there any quick fix for this?
This was 100% straightforward before.

martin.hirschvogel · August 29, 2023, 6:12am

I can only shed background. For support, a community member would have to help.
But before 2.1, the behavior was in some cases not defined, leading to arbitrary results for clustered services. Therefore, now if a check does not have it natively implemented, you as a user need to make the conscious decision.
Make sure that the aggregation option rule applies to the services and hosts you want to cluster.

martin.hirschvogel · August 29, 2023, 8:10am

By the way, this explains the history and options.

From 19:20 on

LaSoe · August 29, 2023, 8:29am

In the “Aggregation options for clustered services” rule, you can specify the aggregation mode for services without their own check-specific cluster function.

For FS, this would normally be “Worst Node wins”, since you want to be notified when an FS runs out of space.

gera83 · August 29, 2023, 10:46am

Thank you, already saw it.

gera83 · August 29, 2023, 10:48am

Yes. Correct. The problem i’m seeing is that, they stay unknown, like if they are not being discovered correctly.

Suppose → Cluster → Cluster01
Nodes → Node01, Node02

On explicit host, you put Node01 and Node02, right? You don’t put the cluster name.
What i’m doing wrong? Am i missing something?

Thanks!!

gera83 · August 29, 2023, 12:52pm

OMG, i didn’t want to believe it.
You have to specify the cluster on “explicit hosts”, not the nodes.

It’s working

Now. I guess i have one last question.
I see same output for failover, worst or best.

Filesystems are clustered, obviously, but they are mounted on one node at a time.

So, my question. Is there any real difference between the 3 options for filesystems?

Thanks!

gera83 · August 29, 2023, 1:32pm

Ok. I’ve done my testing.
What i’ve seen seems to be exactly what that capture from the presentation shows.

Failover. If i reboot the node with all the disks mounted, i get a WARNING, which is not pretty, until all is back online; then it shows all the disks on the other node. All OK.

Worst. Not pretty. Some disks turn to UNKNOWN until everything is back online.

Best. This seems to be the best option. Always green. Never a warning or unknown.

I guess this is it

martin.hirschvogel · August 30, 2023, 7:45am

Happy that the presentation is correct
@thkos - Gerardo has some further input for the update of the cluster article

thkos · August 31, 2023, 11:23am

We will come back to this post (or even to you, @gera83) if we are starting to update the article on clustered services.

robin.gierse · September 4, 2023, 7:34am

@gera83 you want to make sure, you understand what the modes actually do:

Failover expects one and only one node to provide data for a service.
Worst means, that as soon as one node experiences an issue, you will be notified.
Best means, as long as at least one node is fine, the clustered service is fine.

And the UNKN and WARN states you mention can be valid states. For Worst, the UNKN is correct, if the disks need to move to the other node. Because during that time, Checkmk receives no data. The same applies for Failover: If the disks are not online, or can be seen on both nodes, then you get a warning. This might actually not be what you are looking for, but it is certainly valid.

Just want to make sure you got the right solution.

gera83 · September 14, 2023, 10:09pm

Hi Robin. Totally correct. Remember that, I just need filesystem usage. I don’t need an UNKNOWN. I just need OK, WARN, CRIT, for disk space levels.

If there is indeed a problem with the disk, i have it monitored with other methods.

Thank you!!

Davide · November 28, 2023, 5:18pm

@martin.hirschvogel I think that correct rappresentation for “Best” should be

Am I wrong?

martin.hirschvogel · November 29, 2023, 12:41pm

Hi,

both are correct - your’s and mine. As soon as there is one node in “OK”, it is being evaluated as OK for “Best”.