Extend option for hosts which instance should monitor them

sirbaas · September 1, 2020, 10:38pm

Feedback
I don’t know if more people have the same problem on the forum, but I’m lacking the option for “Monitor on Site”.
I’m referring to “Specifying to the hosts and folders which instance should monitor them”.

Problem
In example if you have multiple site (DMZ, Intern, Customer etc) then you can create multiple site. For every ‘area’ a slave-site, where based on folder the host is monitored on a site.

Idea
Can we not have rules for this, so that you don’t need to have separated administration on witch Site the host need to be monitored. In a time where you buy ‘a service’ and your (the user) dont care on witch Site the host in monitored on. But for the admin it does, due performance/custom things/network rule set.

simon-mueller · September 1, 2020, 10:42pm

I don’t know if I understood you 100% right, but in a distributed setup you can choose the site which should monitor the host by editing the host and selecting the checkbox to choose from a dropdown menu:

sirbaas · September 2, 2020, 9:37am

Your completely correct.

For some background we got two teams, one managing the platform/datacenter and other team manage everything for customer applications. So in the eye of the application team they don’t care if the machine is running in DC1 or DC3, they want to monitor customer host/applications.

The platform/datacenter care more because they have to allocate resource, make network changes, security change and for example limit cross-datacenter communication.

So when application-team add more host, they add the host to the folder (with ansbile/API/etc) customerA or customerB. Then the platform/datacenter need to run recurring automation script to say:
Machine1-customerA;DC1;site1-intern
Machine2-customerA;DC1;site1-dmz
Machine1-customerB;DC2;site2-intern
Machine2-customerB;DC2;site2-dmz
If they use other name or administration is not up to date. There are unnecessary error’s.
Automation like https://github.com/wrossmann/add-to-check_mk or https://github.com/tribe29/ansible-checkmk

It would really help if we can make rule in CheckMK to say: CustomerA (regulair expression) run in DC1 site1-intern and CustomerA-DMZ (regulair expression) run in DC1 site1-DMZ and so on. So that no matter what host are added. With will run on the correct site. Not forgetting the production-env/testing-env/develement-env etc.

simon-mueller · September 2, 2020, 7:27pm

Maybe you can get a hold of your problem by using the inheritance feature of checkmk folder? For every folder you create in WATO you can set properties which will be applied to all hosts inside it. For example at my former employer I used to create the following structure and apply settings to those folders:

Main Directory (/)
|_ Internal
|_ Customers
    |_ Customer A (monitored at site customer_a)
       |_ Network Devices
       |_ Virtual Machines
       |_ ...
    |_ Customer B (monitored at site customer_b)
       |_ Servers
       |_ Remoteboards
       |_ ....
    |_ Customer C (bigger one, don't set the site (yet))
      |_ Site 1 (monitored at customer_c_site_1)
        |_ Servers
      |_ Site 2 (...)
        |_ Servers
    |_ ...

Thus, you only have to set the directory when using automation which you could eg. name after some internal CustomerIds or other codenames which all teams understand altogether…

Another advantage of this is that when you create rulesets for eg. customer B and you apply the rules to the folder of customer B, you only need to restart the site of customer B and not all sites at the same time. There are checkmk users which needed even more separation. That’s when the CME was introduced.

andreas-doehler · September 2, 2020, 7:35pm

The problem is that the property “monitored on site” is a host tag.
You can also not assign host tags with rules.

If you manage the host creating with the WATO API then you can make the decision at creation time with your rules. Is there a problem against this way?

sirbaas · October 3, 2020, 4:13pm

simon-mueller:

Maybe you can get a hold of your problem by using the inheritance feature of checkmk folder? For every folder you create in WATO you can set properties which will be applied to all hosts inside it. For example at my former employer I used to create the following structure and apply settings to those folders:
Main Directory (/)
|_ Internal
|_ Customers
    |_ Customer A (monitored at site customer_a)
       |_ Network Devices
       |_ Virtual Machines
       |_ ...
    |_ Customer B (monitored at site customer_b)
       |_ Servers
       |_ Remoteboards
       |_ ....
    |_ Customer C (bigger one, don't set the site (yet))
      |_ Site 1 (monitored at customer_c_site_1)
        |_ Servers
      |_ Site 2 (...)
        |_ Servers
    |_ ...
Thus, you only have to set the directory when using automation which you could eg. name after some internal CustomerIds or other codenames which all teams understand altogether…

Another advantage of this is that when you create rulesets for eg. customer B and you apply the rules to the folder of customer B, you only need to restart the site of customer B and not all sites at the same time. There are checkmk users which needed even more separation. That’s when the CME was introduced.

I like the idea of the customer separation, than the user can automate there task and just add host to the correct folder.
I think you have to understand that we have 50+ customer, where we provide a service (SAAS). The monitoring is only for internal use to meet the SLA-agreement. So CME doesn’t have that more benefits to have (I tested the CME 1.6 version). Besides that for the customer it doesn’t matter if everything is running in one of the other datacenter.

Side note. In our mind creating a site for every customer seem to be to excessive, so we have 1site for every environment and for every datacenter. We have 5 different environment, so then we would get something like:

Main Directory (/)
|_ Internal
    |_ Virtual Machines
        |_ Development environment
            |_ Datacenter1 (monitored at internal_development_dc1_site1)
            |_ Datacenter2 (monitored at internal_development_dc2_site1)
        |_ Second environment 
            |_ Datacenter1 (monitored at internal_second_dc1_site1)
            |_ Datacenter2 (monitored at internal_second_dc2_site1)
        |_ Third environment
            |_ ........
        |_ Fourth environment
            |_ ........
        |_ Production environment
            |_ ........
    |_ Network
       |_ .........
    |_ Appliances
       |_ .........
|_ Customers
    |_ Customer A
        |_ Development environment (monitored at customers_development_dc1_site1)
        |_ Second environment 
            |_ ........
        |_ Third environment
            |_ ........
    |_ Customer B (monitored at site customer_b)
        |_ Development environment (monitored at customers_development_dc2_site1)
        |_ Second environment 
            |_ ........
        |_ Third environment
            |_ ........
    |_ Customer C (bigger one, don't set the site (yet))
        |_ Development environment (monitored at customers_development_dc1_site2)
        |_ Second environment 
            |_ ........
        |_ Third environment
            |_ ........

My goal is to make it easier for the CheckMK admin to monitoring stuff per datacenter. And use as simpel hiera-folder structure for CheckMK internal user for there automation.

There is no problem with the WATO API, just more that within our organization it seems not to work.

For some background we have 2 department, one managing all in the infrastructure (including creating and setup CheckMK as a service for the other department) and one managing all the customer application.

When the application-team is adding host to CheckMK, they don’t care witch CheckMK site is monitoring the host. The infrsatructure-team does care for capacity and performance. Currently we don’t have hiera-folder as above, so we would need to run scheduled automation job. Checking it for ~3000 host in CheckMK and this seems resources intensive. Secondly we need to have a second location (example ansible inventory) to store the information in witch datacenter host are running. Instead of a rule with CheckMK for CheckMK so to speak.