Proxmox cluster host broken after update to 2.2

CMK version: CRE 2.2.0p2
OS version: Ubuntu 20.04

After I upgraded to 2.2.0p2 from 2.1.0p29, my proxmox “cluster” hosts do not show the clustered services anymore, they’re being reported as vanished:

If I check the service discovery on the cluster members, they are correctly being shown as belonging to the cluster:

I already deleted the cluster host and created a new one, but it has the same issue.

The cluster hosts are set to no IP, contain the cluster members and have a few services set.


Additionally, another issue with proxmox clusters (that might be related):
All our cluster host became WARN or CRIT because it said

image

On some clusters (that did not use the Proxmox special agent), I could solve this by enabling the special agent and setting all hosts and the cluster node to “use API integration and checkmk agent”.
On other clusters where this is not wanted or possible, the proper setting in the hosts would be “Configured API integrations if configured, else Checkmk agent”, to only use the agent. Obviously the cluster host, being a virtual and not a real host, has no checkmk agent, yet its configuration has to match the configuration of the cluster nodes. If all cluster nodes and the cluster host have Configured API integrations if configured, else Checkmk agent set, the message seen on the screenshot appears on the cluster.

This is probably a regression, but I can’t be the only one with proxmox clusters and CRE 2.2, so maybe I’m missing something here…

1 Like

Inside a cluster object all cluster nodes and the cluster itself need the same settings für agent and SNMP.
That should be already enforced with CMK 2.1.
If not you has got there a warning message, that your cluster config differs from the node config.

Yes, they are identical. I tried changing the cluster host to try and get rid of the datasource error, but it did not let me save due to the different config.

Same for me. Worked prior to updating to 2.2.

Used it for all Ceph services, cluster isn’t showing any services, but this error instead:

Could not find any service for your cluster. You first need to specify which services of your nodes shal be added to the cluster. This is done using the Clustered services ruleset.

Didn’t change any rules. When disabling the rule, the (three) hosts each show the services.

On my system the cluster is working as before without any problem.
Booth nodes and the cluster object are configured with “Configured API integration and CheckMK agent”.
Discovery inside CMK looks like this.


The Proxmox special agent data i fetch only from one node, but also the node without the configured special agent has the same settings as the second one.

Does your cluster host have an IP/DNS name, or is it set to no IP?

For us it’s in fact not only Proxmox clusters, but also Cisco Nexus switch clusters (two switches and then a few fabric extenders connected to both switches, so their interfaces show up on both switches which is why I made a cluster for that).
In this case, the datasource is SNMP (no API integrations, no CheckMK agent).
image

Same thing in the service discovery, they’re all listed as vanished. When I delete the cluster rule and cluster host and then recreate the host, save, activate, create the rule, save, activate, do a discovery on the nodes and the cluster host, they immediately return in vanished state.

And again, the same 234 vanished services are listed as “on the cluster host” when I look at a node’s service discovery:

It has no IP and is also not set to no IP :slight_smile:
This is the configuration for the cluster object.


Host state output

This is it! Removing the “no IP” setting from the cluster hosts makes them “recover” and the services are back.

I used to set these clusters to “no IP” because iirc, CheckMK complained about not being able to resolve the host name before, and the docs on creating clusters tell you that

If you are dealing with a cluster without a cluster IP address, you will need to take a not-so-comfortable detour, by selecting No IP in the Network Address box for the IP Address Family.

I’ll remove this setting on my clusters for now until this bug is fixed.

1 Like

Worked for me as well!

I think this is no bug, but a documentation problem as the cluster works now as it is expected.
The output “Assumed up, because at least one parent is up” is new.
It would be good if @mschlenker get this information :slight_smile:

1 Like

I am not too sure about that.

We have clusters with their own (DNS) host name and their own IP(s). We also have clusters without that (like Proxmox VE and Cisco switch clusters - anything that has one or more shared resources).

Previously, with clusters without their own IP, you would

  • set the cluster host to “No IP” so CMK wouldn’t try to resolve the cluster’s host name and ping it
  • set the cluster datasource to “No API, no agent” so CMK would not complain that neither API nor agent would respond

This felt quite reasonable, but now fails to save, because the cluster host needs the same datasource configuration as the cluster members.

The new way is to not set “No IP” even if the cluster host does not have a DNS hostname and/or a dedicated IP address. This feels very counterintuitive. CMK will not complain about this, but

  • I have a couple of active checks like SSH that are disabled on “No IP” hosts, I will have to find another way to exclude cluster hosts without an IP address now
  • a cluster host that does have an IP address now shows all services from the cluster node that HA IP address is currently assigned to, as opposed to requiring you to create a rule in Setup > Services > Service monitoring rules > Clustered services. An example is a PFsense cluster with a CARP IP (although in this case, it would probably make more sense to define all CARP IPs as additional IPs in the cluster host and create ICMP rules for that)

This means to me that when creating a cluster host in CMK,

  • its host name must not be a resolvable DNS host name
  • yet you must not set the cluster host to “No IP”
  • you can then create your Clustered Services rules if applicable
  • if your cluster has one or more HA IPs, you have to add them to Additional IPv4/6 addresses in the cluster host configuration and then create an ICMP check for that if you wish to ping check
1 Like