Aggregation options for clustering services

CMK version: 2.1.0p32

I cannot get it to work with Filesystems.

I’ve tried just “Clustered services” first, which you already now what happens:

Filesystems appear perfectly, but with a “UNKNOWN” state, and that weird message about “native cluster is not supported for this kind of service, use aggregations”.

Well, i’ve tried aggregations, with Native, failover, best, worst.
All of them. No filesystem appears.

With 1.2.6, 1.4.0, it was so easy.
Guys, please, what are the exact steps to get it to work?

Thank you!!

Same in 2.2.0p8. I have played with all the options.

Is there any quick fix for this?
This was 100% straightforward before.

I can only shed background. For support, a community member would have to help.
But before 2.1, the behavior was in some cases not defined, leading to arbitrary results for clustered services. Therefore, now if a check does not have it natively implemented, you as a user need to make the conscious decision.
Make sure that the aggregation option rule applies to the services and hosts you want to cluster.

By the way, this explains the history and options.

From 19:20 on

In the “Aggregation options for clustered services” rule, you can specify the aggregation mode for services without their own check-specific cluster function.

For FS, this would normally be “Worst Node wins”, since you want to be notified when an FS runs out of space.

1 Like

Thank you, already saw it.

Yes. Correct. The problem i’m seeing is that, they stay unknown, like if they are not being discovered correctly.

Suppose → Cluster → Cluster01
Nodes → Node01, Node02

On explicit host, you put Node01 and Node02, right? You don’t put the cluster name.
What i’m doing wrong? Am i missing something?

Thanks!!

OMG, i didn’t want to believe it.
You have to specify the cluster on “explicit hosts”, not the nodes.

It’s working :smiley:

Now. I guess i have one last question.
I see same output for failover, worst or best.

Filesystems are clustered, obviously, but they are mounted on one node at a time.

image

So, my question. Is there any real difference between the 3 options for filesystems?

Thanks!

Ok. I’ve done my testing.
What i’ve seen seems to be exactly what that capture from the presentation shows.

Failover. If i reboot the node with all the disks mounted, i get a WARNING, which is not pretty, until all is back online; then it shows all the disks on the other node. All OK.

Worst. Not pretty. Some disks turn to UNKNOWN until everything is back online.

Best. This seems to be the best option. Always green. Never a warning or unknown.

I guess this is it :slight_smile:

2 Likes

Happy that the presentation is correct :slight_smile:
@thkos - Gerardo has some further input for the update of the cluster article

We will come back to this post (or even to you, @gera83) if we are starting to update the article on clustered services.

@gera83 you want to make sure, you understand what the modes actually do:

  • Failover expects one and only one node to provide data for a service.
  • Worst means, that as soon as one node experiences an issue, you will be notified.
  • Best means, as long as at least one node is fine, the clustered service is fine.

And the UNKN and WARN states you mention can be valid states. For Worst, the UNKN is correct, if the disks need to move to the other node. Because during that time, Checkmk receives no data. The same applies for Failover: If the disks are not online, or can be seen on both nodes, then you get a warning. This might actually not be what you are looking for, but it is certainly valid.

Just want to make sure you got the right solution.

Hi Robin. Totally correct. Remember that, I just need filesystem usage. I don’t need an UNKNOWN. I just need OK, WARN, CRIT, for disk space levels.

If there is indeed a problem with the disk, i have it monitored with other methods.

Thank you!!

@martin.hirschvogel I think that correct rappresentation for “Best” should be

image

Am I wrong?

Hi,

both are correct - your’s and mine. As soon as there is one node in “OK”, it is being evaluated as OK for “Best”.