Master/Slave? Backup VM? Live Status? I need a resilient backup

Yes but then you have missing data and alerting during the process of restoring to a new instance?

Availability sets might be our way forward.

What happens if the checkmk server goes down in regards to the agents? Do they hold information that they would then forward the next time checkmk is back online?

Or would they just start sending current data? I want to know what happens to that missing data should the checkmk server go down?

1 Like

Just one observation about using Azure, with Premium Disk you have 99,9% of availability, if that is not acceptable you should proceed with desiging an HA solution. The easiest approach from my perspective is to have two monitoring servers and monitoring your devices from those two locations the agent registration would need to be done only in main one and a backup would need to be copied every hour or less to the secondary one. Contact me by DM if you want to advance with this discussion.

Checkmk agent don`t retain data, the pushed or pulled data refeers to the moment of the data collection.