`Sounds very interesting, even though I wanted to avoid writing all these kind of predictive monitoring scripts, but I think there is no way around it.
I am still wondering how the implementation of the service dependencies discovery and root cause detection would look like? I am not talking about a specific service, rather the whole service dependencies in our applications and infrastructure landscape.
For an example:
We had the situation that some apps go to critical, not providing data, or hangs, as for us admins, we can easily spend 10-20 minutes try to find out why, and at the end it is DB query which takes too long and blocks all other operations, so we wrote local
checks for such incidents, we have a big set of such local checks, unfortunately only the developers and the DB admins know the dependencies. Some times it just a network issue, but identifying the bottle neck can also take some time.
So we are thinking of implementing some kind of service dependencies discovery with root cause detection, I know it is not trivial, I would like to know if it is possible with check_mk and what would be the best approach?
On 12/14/19 1:47 AM, Patrick Gavin wrote:
Predictive monitoring is possible on any metric if you create a custom nagios plugin that uses rrd predictsigma and call it as an active check.
As for root cause detection… I dunno… do association rule learning on anomaly sets extracted from the data generated by the aforementioned predictsigma check?
I’d do it myself, but I’m overworked and underpaid.
On Dec 13, 2019, at 12:12 AM, Ghassan Elrayah firstname.lastname@example.org wrote:
we are running distributed monitoring with check-mk-raw with ~2000 hosts
and ~60000 services spread over 8 slaves for over 10 years now. We are
planning to go enterprise.
The are two features are very important for us:
service dependencies discovery with root cause detection
I know that predictive monitoring is only possible in CMC if the
respective checks have implement it like CPU load. But what about checks
that doesn’t? Cheph for instance?
The performance improvement with CMC and other features are very nice,
but are not a reason for the migration at the moment.
Will the the root cause detection and service dependencies discovery
ever be implemented in Check_MK?
The parent child relationship or business inelegance are very tedious to
setup and contentiously changing, you can imagine when replacing some
compute nodes or introducing new tenants with new VM’s, all this need to
be setup from scratch or modified.
checkmk-en mailing list
Manage your subscription or unsubscribe
The mailing list will be shut down by 31.12.2019. Sign up at the Checkmk Community forum:
-- Beste Grüße