How do you handle automatic agent updates via bakery? (CME, multi-customer distributed checkmk)

Hi all

We are running a distributed monitoring setup where we monitor our customer’s infrastructure and services on dozens of different sites for dozens of customers. As there is quite some effort involved in keeping Checkmk agents up-to-date - even when using external automation - we want to try out automatic agent updates.

The following is given:

  • There are multiple isolated customers
  • Each customer has their own agent with unique encryption secret
  • Every customer has dedicated system engineers assigned, that support them, perform changes and maintenance, set up new servers, install and update Checkmk agents, and so on

We want(ed) to achieve that those dedicated system engineers are able to bake and sign agents for their customers themselves. Due to security and organisational restrictions, this entails using a separate signature key for each customer.

So far so good - at least we thought :wink:

What we gathered from our testing:

  • Using separate signature keys is supported. However, whenever someone signs agents with one signature key every host that does not accept this signature key will yield a WARN with No valid signature found. This is not configurable. (As one engineer is responsible for multiple customers, every agent for the customers not using the chosen signature key, will “break”)
  • There seems to be no way to require agents to be signed. If someone were to just re-bake the agents without signing them, every host of their customers would yield a warning. (This is especially bad if an admin messes up and bakes all agents globally, without signing them…)
  • Using one single global signature key solves the first issue. When using automation to bake the agents and dropping the requirement for engineers to be able to bake themselves, the second issue is solved. Though, not optimal.

What we had hoped for was a functionality to link agents to signature keys so that only the hosts using this specific key will actually get updated. Further, it would be mandatory in our case to somehow enforce agent signing. We would not want to get 8000+ events because someone craves for their Friday evening beer and forgets to sign agents :wink:

Have any of you implemented automatic updates for a setup similar to ours? How do you handle this?
Are we missing something?

Have a nice day :partying_face: