HI All, I have set up monitoring of a HANA DB on pacemaker cluster using the doc https://docs.checkmk.com/latest/en/clustered_services.html#:~:text=In%20a%20distributed%20monitoring%20all,for%20the%20IP%20address%20family (modifying the steps for Linux as appropriate). Now I have all the clustered resources showing up as WARNing in the WATO (monitoring) and in Nagstamon. Not sure what is wrong but I think from the CheckMK documentation, they resources are in WARNing status because of “Additional results from: ” which I am not sure how to correct. I am wondering if it is because we use the short name in the PCS cluster config but FQDNs in Check_MK config?
I am using the “Failover” option in the cluster aggregation rule, and the resources are currently active on node 1 (-cl1). Looking for configuration changes I need to make to get rid of the WARNing status of the clustered resources.
These services must not be assigned to the cluster host.
They have individual results on each node and are not migrated by pacemaker/corosync.
Hi Robert, apologies but I don’t understand your comment. The doc says to use the clustered services aggregate rule to associate the clustered services with the Cluster host i.e.,
1/ Create a special “cluster-type” host that represents the cluster (e.g., in my case this is the cluster IP) and add the actual physical nodes to it as nodes.
2/ Then create an aggregate clustered resources rule tied to that clustered host (and choose the cluster mode - native, failover, etc.), and add all the clustered resources to that rule.
3/ Run a bulk service discovery on the 2 physical hosts which causes the clustered resources to be associated to the cluster host (rather than the 2 individual physical hosts in the cluster)
Please let me know if I read that wrong and what I need to do to fix it. Thanks!
What @r.sander said is that the services shown inside your screenshot are no clustered services. The services that should be clustered are services that only run on one cluster node in case of a failover cluster or services that run on booth nodes in an active-active cluster.
Something like a database or web server or a VM in a Proxmox cluster, anything like that.
What you need to do now is - specify what service is clustered and first monitor this clustered service correctly on the node without cluster rule. Then you can create the rule to cluster these services.
Maybe follow my post here:
regards
Michael
