[Check_mk (english)] New features in Check_MK enterprise

Hi,

we are running distributed monitoring with check-mk-raw with ~2000 hosts
and ~60000 services spread over 8 slaves for over 10 years now. We are
planning to go enterprise.

The are two features are very important for us:
1. service dependencies discovery with root cause detection
2. predictive monitoring.

I know that predictive monitoring is only possible in CMC if the
respective checks have implement it like CPU load. But what about checks
that doesn't? Cheph for instance?

The performance improvement with CMC and other features are very nice,
but are not a reason for the migration at the moment.

Will the the root cause detection and service dependencies discovery
ever be implemented in Check_MK?

The parent child relationship or business inelegance are very tedious to
setup and contentiously changing, you can imagine when replacing some
compute nodes or introducing new tenants with new VM's, all this need to
be setup from scratch or modified.

Thanks

···

--
Beste Grüße

Ghassan Elrayah
Mail: ghassan.elrayah@live.de

Predictive monitoring is possible on any metric if you create a custom nagios plugin that uses rrd predictsigma and call it as an active check.

As for root cause detection… I dunno… do association rule learning on anomaly sets extracted from the data generated by the aforementioned predictsigma check?

I’d do it myself, but I’m overworked and underpaid.:wink:

-P

···

On Dec 13, 2019, at 12:12 AM, Ghassan Elrayah ghassan.elrayah@live.de wrote:

Hi,

we are running distributed monitoring with check-mk-raw with ~2000 hosts
and ~60000 services spread over 8 slaves for over 10 years now. We are
planning to go enterprise.

The are two features are very important for us:

  1. service dependencies discovery with root cause detection
  2. predictive monitoring.

I know that predictive monitoring is only possible in CMC if the
respective checks have implement it like CPU load. But what about checks
that doesn’t? Cheph for instance?

The performance improvement with CMC and other features are very nice,
but are not a reason for the migration at the moment.

Will the the root cause detection and service dependencies discovery
ever be implemented in Check_MK?

The parent child relationship or business inelegance are very tedious to
setup and contentiously changing, you can imagine when replacing some
compute nodes or introducing new tenants with new VM’s, all this need to
be setup from scratch or modified.

Thanks


Beste Grüße

Ghassan Elrayah
Mail: ghassan.elrayah@live.de


checkmk-en mailing list
checkmk-en@lists.mathias-kettner.de
Manage your subscription or unsubscribe
https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en

The mailing list will be shut down by 31.12.2019. Sign up at the Checkmk Community forum: https://forum.checkmk.com.

https://checkmk.com/download/checkmk2019%20-%20Capacity%20Management.pdf
some informations from the last conference

I thing you should contact tribe29 for the latest status of this feature.

Ralf

···

Von meinem iPad gesendet

Am 13.12.2019 um 09:12 schrieb Ghassan Elrayah ghassan.elrayah@live.de:

Hi,

we are running distributed monitoring with check-mk-raw with ~2000 hosts
and ~60000 services spread over 8 slaves for over 10 years now. We are
planning to go enterprise.

The are two features are very important for us:

  1. service dependencies discovery with root cause detection
  2. predictive monitoring.

I know that predictive monitoring is only possible in CMC if the
respective checks have implement it like CPU load. But what about checks
that doesn’t? Cheph for instance?

The performance improvement with CMC and other features are very nice,
but are not a reason for the migration at the moment.

Will the the root cause detection and service dependencies discovery
ever be implemented in Check_MK?

The parent child relationship or business inelegance are very tedious to
setup and contentiously changing, you can imagine when replacing some
compute nodes or introducing new tenants with new VM’s, all this need to
be setup from scratch or modified.

Thanks


Beste Grüße

Ghassan Elrayah
Mail: ghassan.elrayah@live.de


checkmk-en mailing list
checkmk-en@lists.mathias-kettner.de
Manage your subscription or unsubscribe
https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en

The mailing list will be shut down by 31.12.2019. Sign up at the Checkmk Community forum: https://forum.checkmk.com.

Thanks for the link. I couldn't find anything to it now. I will ask tribe29. Best

···

On 12/14/19 8:59 AM, Ralf Prengel wrote:

https://checkmk.com/download/checkmk2019%20-%20Capacity%20Management.pdf
some informations from the last conference

I thing you should contact tribe29 for the latest status of this feature.

Ralf

Von meinem iPad gesendet

Am 13.12.2019 um 09:12 schrieb Ghassan Elrayah :

Hi,

we are running distributed monitoring with check-mk-raw with ~2000 hosts

and ~60000 services spread over 8 slaves for over 10 years now. We are

planning to go enterprise.

The are two features are very important for us:

  1. service dependencies discovery with root cause detection

  2. predictive monitoring.

I know that predictive monitoring is only possible in CMC if the

respective checks have implement it like CPU load. But what about checks

that doesn’t? Cheph for instance?

The performance improvement with CMC and other features are very nice,

but are not a reason for the migration at the moment.

Will the the root cause detection and service dependencies discovery

ever be implemented in Check_MK?

The parent child relationship or business inelegance are very tedious to

setup and contentiously changing, you can imagine when replacing some

compute nodes or introducing new tenants with new VM’s, all this need to

be setup from scratch or modified.

Thanks

Beste Grüße

Ghassan Elrayah

Mail:


checkmk-en mailing list

Manage your subscription or unsubscribe

The mailing list will be shut down by 31.12.2019. Sign up at the Checkmk Community forum:
.

ghassan.elrayah@live.de

ghassan.elrayah@live.decheckmk-en@lists.mathias-kettner.dehttps://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-enhttps://forum.checkmk.com


-- Beste Grüße
Ghassan Elrayah
Mail:

ghassan.elrayah@live.de

`Sounds very interesting, even though I wanted to avoid writing all these kind of predictive monitoring scripts, but I think there is no way around it.

I am still wondering how the implementation of the service dependencies discovery and root cause detection would look like? I am not talking about a specific service, rather the whole service dependencies in our applications and infrastructure landscape.

For an example:

We had the situation that some apps go to critical, not providing data, or hangs, as for us admins, we can easily spend 10-20 minutes try to find out why, and at the end it is DB query which takes too long and blocks all other operations, so we wrote local
checks for such incidents, we have a big set of such local checks, unfortunately only the developers and the DB admins know the dependencies. Some times it just a network issue, but identifying the bottle neck can also take some time.

So we are thinking of implementing some kind of service dependencies discovery with root cause detection, I know it is not trivial, I would like to know if it is possible with check_mk and what would be the best approach?

Thanks`

···

On 12/14/19 1:47 AM, Patrick Gavin wrote:

Predictive monitoring is possible on any metric if you create a custom nagios plugin that uses rrd predictsigma and call it as an active check.

As for root cause detection… I dunno… do association rule learning on anomaly sets extracted from the data generated by the aforementioned predictsigma check?

I’d do it myself, but I’m overworked and underpaid.:wink:

-P

On Dec 13, 2019, at 12:12 AM, Ghassan Elrayah ghassan.elrayah@live.de wrote:

Hi,

we are running distributed monitoring with check-mk-raw with ~2000 hosts

and ~60000 services spread over 8 slaves for over 10 years now. We are

planning to go enterprise.

The are two features are very important for us:

  1. service dependencies discovery with root cause detection

  2. predictive monitoring.

I know that predictive monitoring is only possible in CMC if the

respective checks have implement it like CPU load. But what about checks

that doesn’t? Cheph for instance?

The performance improvement with CMC and other features are very nice,

but are not a reason for the migration at the moment.

Will the the root cause detection and service dependencies discovery

ever be implemented in Check_MK?

The parent child relationship or business inelegance are very tedious to

setup and contentiously changing, you can imagine when replacing some

compute nodes or introducing new tenants with new VM’s, all this need to

be setup from scratch or modified.

Thanks

Beste Grüße

Ghassan Elrayah

Mail: ghassan.elrayah@live.de


checkmk-en mailing list

checkmk-en@lists.mathias-kettner.de

Manage your subscription or unsubscribe

The mailing list will be shut down by 31.12.2019. Sign up at the Checkmk Community forum:
.

https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en

https://forum.checkmk.com


-- Beste Grüße
Ghassan Elrayah
Mail:

ghassan.elrayah@live.de

I’m in the process of reworking my predictive monitoring check to run under check-mk/omd. It was originally written for plain old nagios/pnp4nagios. It’s not currently ready for release, but you can get the gist of it here: [https://github.com/wezelboy/check_predicted](https://github.com/wezelboy/check_predicted)
The root cause analysis stuff is just an idea. I don’t really have time to implement it.
-P
···

On 12/14/19 1:47 AM, Patrick Gavin wrote:

Predictive monitoring is possible on any metric if you create a custom nagios plugin that uses rrd predictsigma and call it as an active check.

As for root cause detection… I dunno… do association rule learning on anomaly sets extracted from the data generated by the aforementioned predictsigma check?

I’d do it myself, but I’m overworked and underpaid.:wink:

-P

On Dec 13, 2019, at 12:12 AM, Ghassan Elrayah ghassan.elrayah@live.de wrote:

Hi,

we are running distributed monitoring with check-mk-raw with ~2000 hosts

and ~60000 services spread over 8 slaves for over 10 years now. We are

planning to go enterprise.

The are two features are very important for us:

  1. service dependencies discovery with root cause detection

  2. predictive monitoring.

I know that predictive monitoring is only possible in CMC if the

respective checks have implement it like CPU load. But what about checks

that doesn’t? Cheph for instance?

The performance improvement with CMC and other features are very nice,

but are not a reason for the migration at the moment.

Will the the root cause detection and service dependencies discovery

ever be implemented in Check_MK?

The parent child relationship or business inelegance are very tedious to

setup and contentiously changing, you can imagine when replacing some

compute nodes or introducing new tenants with new VM’s, all this need to

be setup from scratch or modified.

Thanks

Beste Grüße

Ghassan Elrayah

Mail: ghassan.elrayah@live.de


checkmk-en mailing list

checkmk-en@lists.mathias-kettner.de

Manage your subscription or unsubscribe

The mailing list will be shut down by 31.12.2019. Sign up at the Checkmk Community forum:
.

https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en

https://forum.checkmk.com


-- Beste Grüße
Ghassan Elrayah
Mail:

ghassan.elrayah@live.de

`This look pretty good, I’ll go through it. As for root cause detection may be it would be possible to make use of apriori algorithm to compute the metrics. I’ve never worked with it before, but I am really interested. I Need first to make refreshment for
``machine learning.

Thanks a lot

`

···

On 12/17/19 10:14 PM, Patrick Gavin wrote:

I’m in the process of reworking my predictive monitoring check to run under check-mk/omd. It was originally written for plain old nagios/pnp4nagios. It’s not currently ready for release, but you can get the gist of it here: [https://github.com/wezelboy/check_predicted](https://github.com/wezelboy/check_predicted)
The root cause analysis stuff is just an idea. I don’t really have time to implement it.
-P

On Dec 15, 2019, at 9:47 PM, Ghassan Elrayah ghassan.elrayah@live.de wrote:

`Sounds very interesting, even though I wanted to avoid writing all these kind of predictive monitoring scripts, but I think there is no way around it.

I am still wondering how the implementation of the service dependencies discovery and root cause detection would look like? I am not talking about a specific service, rather the whole service dependencies in our applications and infrastructure landscape.

For an example:

We had the situation that some apps go to critical, not providing data, or hangs, as for us admins, we can easily spend 10-20 minutes try to find out why, and at the end it is DB query which takes too long and blocks all other operations, so we wrote local
checks for such incidents, we have a big set of such local checks, unfortunately only the developers and the DB admins know the dependencies. Some times it just a network issue, but identifying the bottle neck can also take some time.

So we are thinking of implementing some kind of service dependencies discovery with root cause detection, I know it is not trivial, I would like to know if it is possible with check_mk and what would be the best approach?

Thanks`

On 12/14/19 1:47 AM, Patrick Gavin wrote:

Predictive monitoring is possible on any metric if you create a custom nagios plugin that uses rrd predictsigma and call it as an active check.

As for root cause detection… I dunno… do association rule learning on anomaly sets extracted from the data generated by the aforementioned predictsigma check?

I’d do it myself, but I’m overworked and underpaid.:wink:

-P

On Dec 13, 2019, at 12:12 AM, Ghassan Elrayah ghassan.elrayah@live.de wrote:

Hi,

we are running distributed monitoring with check-mk-raw with ~2000 hosts

and ~60000 services spread over 8 slaves for over 10 years now. We are

planning to go enterprise.

The are two features are very important for us:

  1. service dependencies discovery with root cause detection

  2. predictive monitoring.

I know that predictive monitoring is only possible in CMC if the

respective checks have implement it like CPU load. But what about checks

that doesn’t? Cheph for instance?

The performance improvement with CMC and other features are very nice,

but are not a reason for the migration at the moment.

Will the the root cause detection and service dependencies discovery

ever be implemented in Check_MK?

The parent child relationship or business inelegance are very tedious to

setup and contentiously changing, you can imagine when replacing some

compute nodes or introducing new tenants with new VM’s, all this need to

be setup from scratch or modified.

Thanks

Beste Grüße

Ghassan Elrayah

Mail: ghassan.elrayah@live.de


checkmk-en mailing list

checkmk-en@lists.mathias-kettner.de

Manage your subscription or unsubscribe

The mailing list will be shut down by 31.12.2019. Sign up at the Checkmk Community forum:
.

https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en


https://forum.checkmk.com


-- Beste Grüße
Ghassan Elrayah
Mail:

ghassan.elrayah@live.de


-- Beste Grüße
Ghassan Elrayah
Mail:

ghassan.elrayah@live.de