DRBD a pain or a blessing

philippe · June 6, 2024, 8:55am

Hi Forum,

We get occasional failover of our cluster and we are eager to know the root cause
Can I know your experience, what are possible triggers that cause a failover and how do you detect these in the logs ?
(I know don’t cluster virtual appliance, but that is info that was NOT available while we were installing)

aeckstein · June 6, 2024, 11:08am

Hi Philippe,

i would remove the clustering and rely on the HA features of the hyper visor:

https://checkmk.atlassian.net/wiki/spaces/KB/pages/9471820/Why+you+should+not+cluster+the+virtual+appliance

Problems might be VM Snapshots / VM Backups, Network Interruption, Disk IO latency.
In the appliance there is a download option for the cluster log and the kernel log.
In detail you will have to analyze the drbd, pacemaker and corosync entries.

system · June 6, 2025, 11:08am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.