High Availability of check_mk ( two masters , many slaves)

ano · May 18, 2020, 7:47am

Hello team ,

I am using check_mk for almost 1 year now , and we I have some experience of using it as monitoring system.

I have and Installation of Check_mk in virtual machines and using distributed monitoring ( 1 master , 2 slaves ) , 1 master and 1 slave ( in DMZ 1 ) and 1 slave ( DMZ 2 )

I am exporting data to influx Db and visualize them in grafana
I am also using the feature of creating the ticket in Service Now and Jira when a critical service event occur.

I have some question on how to have a check_mk High available .
Imagine on Friday at 22h:00 , check_mk is down ( one of the slave in the DMZ is down , infrastructure problems) , so during the week end, I am loosing all the monitoring and metrics , and I can’t raise a ticket to notify the guard for the infrastructure problem , a very big problem

So I am finding the way to have a fall over when one the slave or master is down , I have to restart automatically the backups to perform the monitoring.

What I have are :

1 Master --> 1 Master Backup
1 Slave --> 1 Slave backup
2 Slave —> 2 Slave backup

A job is launched every day to align the machines ( omd restore , omd start , disable check and notifications )

Any suggestion ? or someone who has the same experiences ?

kribbit · May 18, 2020, 8:08am

Hi Ano,
as far as i understand, you are not using the CheckMK appliance. If you would, you could setup a HA system right away as the appliance has all required features.

https://checkmk.de/cms_appliance_usage.html
or

If you use the non-commercial version, you need to set up things on your own. Personally, if you require HA and want that supported by Tribe, you should investigate the option to upgrade to Tribe appliances, either virtual or HW.

That way - you ensure support by the vendor.

ano · May 18, 2020, 8:45am

Hello Kribbit ,
In the first time , we’ve used the raw editon then we upgraded to the paid version
I am using the paid version of check_mk ( CEE 1.6.p9) .

I’ve installed all the check_mks instance in Red hat enterprise Linux 7 in a Hybrid cloud Infrastructure ( Redhat CloudForms / VM Ware , on premises ) .

Kr
Andry

kribbit · May 18, 2020, 9:08am

Hi Ano,

you should be able to set up the Clustering from the appliance level in that case.

You need to make sure that you do the cluster connection over a different NIC than your live interfaces. According to the documentation, bonded interfaces on a VM will not do you any good. The following picture shows a HW-based setup with bonded interfaces. In your case, you wouldn’t use bonded interfaces. You will still need to setup a cluster IP. In the distributed setup, your master will connect to the slaves via the cluster ip. Please keep in mind - the actual monitoring will still be issued by the “original” IPs and not by the cluster.

If you set it up likewise, the clustered appliance will take over in case of trouble. you do not have to manually sync things around.

I like pictures - they say more than thousand sentences

marass · May 18, 2020, 9:37am

If you want to build it on your on, the mentioned link in [Check_mk (english)] Regarding OMD in HA setup is still a good point to start.

ano · May 18, 2020, 11:57am

Want I want is to implement my own , I am find the entreprise way to do it properly and correclty

system · June 17, 2020, 9:57pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.