Bug? DCD deletes all hosts when kubernetes cluster is unreachable

Hello,

We are evaluating Check_mk Enterprise to Monitor a Kubernetes cluster:

Check MK version: check-mk-free-docker-2.0.0p11
Kubernetes version: 1.21.1

We managed to set up Check_mk monitoring as documented here: Monitoring Kubernetes

When the main Kubernetes Cluster API is not available and check_mk can not pull new piggyback data, the Tool " Dynamic host configuration" deletes all hosts from the inventory(and metrics history).

Expected behavior

  • When Kubernetes API is available and there is no host piggyback data == delete host from inventory >> WORKING
  • When Kubernetes API is not available == wait to pull new data from API before deleting existing inventory >> NOT WORKING

Thx, Sandra

Hi @sperez

How DCD behaves, depends on how you configure it. Please see the documentation
section about the Dynamic Configuration Daemon and
check your validity and deletion options. Can it be, that you have “checked” the option to
automatically delete hosts for which no piggyback data is available?

Regards,
Thomas

2 Likes

To be specific, the chapter 4.2 Dynamic host configuration - Managing dynamic infrastructures describes those options. I faced the same problem with the deletion of data but tuning these 3 options fixed my problem.

1 Like

Hi @openmindz @davidwayne

Thank for the replies, the solution is exactly what I need, but I do not have the option " Keep hosts while piggyback source sends no piggyback data at all"
Is it because the checkmk version?

image

Yes. It looks like the screenshot in the documentation is from 1.6 and not 2.0. The option name has changed in 2.0 but it should work the same way. You can read more about it by activating the “Inline help”.

I have already tried these two features:

But hosts are still been deleted when check_mk can not reach Kubernetes API.

I looked through all menus and options, and it seems that the feature Keep hosts while piggyback source sends no piggyback data at all" does not exist in checkmk 2.0 ?

Dear @sperez

You need to “add an element” in the “piggyback creation options” for the connector you’re configuring and there you have the option. This exists in 1.6 and 2.0. Here a screenshot from my 1.6.0p25 instance:

HTH,
Thomas

Hi @openmindz

I added the new element, but I’m using check_mk_ 2.0:

No option “Keep hosts while piggyback source sends no piggyback data at all”

Hi @sperez

As far as I can see, this is the same in 1.6 or 2.0. As I see on your screenshot the option “Delete vanished hosts / Automatically delete hosts without piggyback data” is unchecked, which is I believe what you want. So at least for me this means, that hosts are not deleted when there is no piggyback data for them: Theoretically this should work.

Thomas

Hi @openmindz

I unchecked " Automatically delete hosts without piggyback data " because if not I would loose all history metrics as hosts are deleted when check_mk is not able to reach Kubernetes API.
I can confirm you that the behavior " Keep hosts while piggyback source sends no piggyback data at all" does not exist/work in check_mk 2.0

Setup - Agent access rules - Processing of Piggybacked Host Data

Hi @_rb ,

Thx for the reply.

I’m going to try the option “Set period how long outdated piggyback data is treated as valid”
and come back with feedback

Cuz I didnt find "Keep hosts while piggyback source send NO piggyback data at all "

sorry, my fault.
Option was renamed in 2.0 to “Validity of missing data”

The option “Validity of missing data” didnt work when the Kubernetes API is unreachable and " " Automatically delete hosts without piggyback data " is checked.

I could ran a few test with the option “Set period how long outdated piggyback data is treated as valid”,
And it does not what I need.

I feel that there is no feature that does "Keep hosts while piggyback source send NO piggyback data at all " in 2.0

I can double check that later and let you know.

Hi!

Have you guys any update on this issue?

Thx, Sandra :slight_smile:

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.