Distributed Monitoring - Cascading Livestatus

marbaa · January 10, 2022, 6:03pm

Hi there again,

on kb.checkmk.com there is article called Cascading Livestatus. I thought that it will suit my needs (satellite does not have direct connection to master), but the explanation doesn’t make sense to me:

A distributed setup where you have remote sites that are not directly reachable
These remote sites are only reachable through a single master" site
You use the “masters” for configuration of all the related sites
The central site is only used as central operating site (overview, reporting) and not for configuration

Step-by-step guide

Scenario: We have a slave which is monitoring some hosts. But this slave site is not directly reachable. Therefore, we need a so-called viewer Site. This viewer Site will get all data for our slave Site via the master Site

First two points are negating itself. If I have remote site which doesn’t have direct connection to master site, how some “viewer” site created on server where master is running can see the slave which doesn’t have direct connection?

If it can be build like this, than, it is awesome:

Could you please enlighten me what it is about?

andreas-doehler · January 10, 2022, 7:53pm

The article from the knowledge base is meant for a setup where you want to split configuration and viewing. That’s a common problem in really big setups.

What you want is not possible if I’m not completely wrong.
If you only want to see the data from the remote slave then one solution would be that the viewer host get these data with livedump and provides the status to the master.

gstolz · January 10, 2022, 8:57pm

if you have ssh access from the master to the first slave, than you could build an ssh tunnel and forward the livestatus + https port from your 2nd (unreachable) slave

Heavy · January 10, 2022, 11:18pm

The KB article is perfectly about what you want to achive, except that in your second picture, you swapped the roles of “master” and “viewer”, compared to the KB article.

I am using “master” and “viewer” as in the KB article:

The article explains how a central “viewer” site can show all monitoring data from all slave sites, even if there is no direct TCP connection between viewer and slave site possible (e.g. due to routing, NAT, or firewalling restrictions). It uses a master site which can reach the slave sites, and can be reached from the viewer site. On the master, you set up distributed monitoring and use the liveproxy daemon. With the configuration shown in the article, the livestatus connection to the slave site can itselves be exported over a TCP port on the master site.
In a second step, configure distributed monitoring on the viewer site and direct it to the exported port on the master site. Thus, the viewer can directly communicate with the slave over the livestatus protocol.

Note that this configuration is only possible with the Checkmk Enterprise Edition on the master site. There is no liveproxyd in the Raw Edition.

The “viewer” site is not created on the same host as the master sites. In fact, this scenario is typically used with several master sites, and with several slave sites behind each master.

marbaa · January 11, 2022, 8:18am

Thanks for trying to explain it Heavy. However, I think I sit on my cable or what, I still don’t understand it and doesn’t make sense to me.

This is the thing which is negating itself imo. If I have master site which can reach the slave sites, then everything is fine and I don’t need to thing about cascading livestatus.

In my understanding, master site/main instance is the site with centralized configuration, where I see all hosts, from all slaves which are distributed monitoring connected to this master site, do central configuration etc.

I believe article is not about my situation.

Following picture is our environment (city names are changed). There is only one godly master site and a lot of distributed monitoring slaves. The slave site Zakopane has no direct network connection, but, we have two empty jump servers in between, where iptables forwarding is configured, so the Zakopane site is able to login to Kosice master site, that works almost perfectly.

So, could cascading livestatus help me to get rid of iptables? By installing another site on jump1 which can direct login to Kosice, install another site on jump2 which can liveproxy daemon connect to jump1 and then finally Zakopane will liveproxy daemon connect to jump2?

Heavy · January 11, 2022, 9:26am

The “viewer” instance in the KB article can be seen as a “super-master” which can not connect to any slave site directly, but can connect to all master sites.

Yes, you could get rid of iptables for the livestatus connection by connecting your master site to the slave3 site through two intermediate liveproxyd instances. But I see no real advantage in that. You loose the ability for centralized configuration of slave3 (unless you keep maintaining some kind of port forwarding or reverse proxying for the configuration channel), and you need to maintain two additional Checkmk instances. I would not do it.

The setup described in the KB article is of most value when you already have multiple distributed monitoring master-slave setups, and you want to set up one central viewer site to display all monitoring results in one place. The article shows that this is possible even without a direct TCP connection between the viewer and the slave sites.

system · January 11, 2023, 9:27am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.