Anyone using the Checkmk Ansible collection for activation of changes in a multisite environment?

Hi all,

what I want to do is in an Ansible playbook to activate changes on all sites, so I simply use this task:

    - name: Activate Changes
      checkmk.general.activation:
        server_url: "https://cmk-master"
        site: master
        automation_user: automation_user
        automation_secret: 'automation_secret'
        force_foreign_changes: true
        redirect: true

with no sites (also tried it by adding all of our sites), the result is always the same:

The task returns immediately after the activation has started and not when it’s finished.

So, either I’m doing something wrong or I’m misunderstanding the documentation of the module

redirect: If set to `true’, wait for the activation to complete

Are you using this module? What is your experience?

Thanks!

I don’t use the collection for activating changes, but your question made me curious. The CheckMK REST API has two relevant endpoints/functions:

  • an HTTP POST to /domain-types/activation_run/actions/activate-changes/invoke that starts a background activation & returns a JSON object containing a field called id which refers to the activation job
  • an HTTP GET to /objects/activation_run/{activation_id} that queries the status of a specific activation job

Looking at the collection’s source code the checkmk.general.activation module does exactly the HTTP POST to start the activation in the background. It returns the whole JSON object retrieved from the POST as the action’s result. Therefore if you capture the result with e.g. register: activation_ you should now have access to the ID as {{ activation_.id }}.

This you can use together with Ansible’s ansible.builtin.uri action in a second task to query the activation status. You can capture its output again with register: … & use the until:, delay:, retries: keywords to loop the task until the returned result indicates that the activation has finished.

Unfortunately there’s no mention of /objects/activation_run in the collection’s source code. Therefore it really seems that there’s no convenience function included this. Neither is there some kind of “wait for completion” parameter in the checkmk.general.activation module.

Edit: fixed name of action to use for custom URI access

2 Likes

Hi Moritz,

that was the first idea I had to get a workaround for this, but.. it is not possible because there is no id returned, here’s what the activation module returns:

ok: [localhost] => {
    "activation_result": {
        "changed": true,
        "failed": false,
        "msg": "302 - Redirected."
    }
}

From what I understand from the module documentation, the parameter “redirect” should wait for the command to complete (as it does e.g. for baking the agents).

But in case of a multisite setup, it simply waits for the master to complete the activation. Probably that’s by design and wanted, but not what we need :slight_smile:

So, I guess I’ll create my own activation plugin.

Cheers,
Christian

1 Like

Yeah, that’s a bummer. You can also use ansible.builtin.url for the initial POST to start the activation process, I guess. Or wrap both tasks in your own plugin, sure.

I bet a pull request against checkmk.general would also be welcome with such functionality.

1 Like

Let’s see, will try to implement this, but not sure if it will be good enough for creating a pull request. I’m pretty sure my python skills are far far away from being good enough for that :slight_smile:

It is not a problem at all to do the activation in a multisite environment with checkmk.general.activation , i have a multisite and use it a lot. you need to give the master-sitename to the site key. thats all. any change for all sites will be activated.

Hi Oliver,

of course, at the end the activation is done on all sites.

The issue I’m facing is that it does not wait for it to complete, it immediately returns when the master site has finished its activation although site synchronization/activation is still running.

@mbunkus did do some debugging of the module and found something strange which I can’t explain with my limited python knowledge…

In the module util method _fetch (of class CheckmkAPI) there is this code:

        # Better translate to json later and keep the original response here.
        content = response.read() if response else ""

which, for whatever reason assignes always the empty string to content although response is defined. Debugging the content of response.read() gives the whole expected return data of the API call, but it’s never assigned to the variable content.

And this is now where I’ve no clue why this happens… when changing the above one line statement to a simple if/then/else, content is assigned and returned to the module, but then still no content is returned to the playbook because the utility function result_as_dict removes any additional data than changed, failed and msg

So, I still see the only chance to fix this for me by writing my own module :frowning:

the best would be: you enhence your python knowlegde and become a contributor, do not write your own module…

1 Like

Wait, what!? Do you mean that the original…

  content = response.read() if response else ""

…and your supposed transform of…

  if response:
    content = response.read()
  else:
    content = ""

…behave differently!? Or did you convert it to something else?

I guess I am late to the party, but I’ll look into this as well. I remember some challenges back in the day, when the module was initially built. I will get back to you, if I find anything, but do not let me stop you from your own research and learning.

@robin.gierse no worries, late guests sometimes are the best :slight_smile:

For the above issue I may have found why it is happening… simple because the response.read() returns and empty string in some cases.

And regarding the issue about the activation module not waiting for all sites, I’ve already a solution too.

My first attempt was to simply implement 2 further lookup functions (similar to your sites and site) for activations and activation. These functions are somehow working but not in a way that I would need. The reason for that is that the API call /objects/activation_run/{activation_id} only returns current running activations. As soon as the activation has finished it returns a http status of 404 which causes the lookup function to fail. Of course one could implement this lookup function differently (e.g. not returning an error in case the activation is already done), but then it doesn’t follow the idea of all the other lookup functions.

Another problem would be with the activations lookup (which should simple find all activations). This is working in general, but… I found no way to find out which activation (if more than 1 is returned) was the one that I have initiated before.

So what I’ve ended up is to start with your activation module plus some modifications of the CheckmkAPI class so that it returns enough data that allows me to reliably find out my activation_id in all cases (as there is a difference when you use “redirect: true” or not).

And with that found activation_id I’m now using the background_job API to find the status of the activation and repeat until the background_job finishes.

Cheers,
Christian

4 Likes

Thank you for your extensive research @MasopustC! :folded_hands:

I am not sure, I agree with your approach, as the behavior discussed feels like a bug to me, which should be easily fixable. I mean your approach works, but I would like to fix it.

Anyways, would you consider creating a PR against the collection with your changes, if that makes sense? Then we could discuss the approach and if it can improve the collection.

For those reading along: We implemented two new lookup modules based on @MasopustC’s code (with his explicit permission of course): Add lookup modules for activations by robin-checkmk · Pull Request #973 · Checkmk/ansible-collection-checkmk.general · GitHub
The pull request is pending final review and the modules will be released with an upcoming release.

Regarding the activation module itself, I am currently uncertain, if we actually have a situation or not. Happy to work on it with a PR or an issue on GitHub (to keep things clean and understandeable, as this thread already has some history).

Hello Robin,
if there’s a case with the activation module is up to you to decide :slight_smile:

As we discussed offline, the issue is with the redirect as the underlying used session object would raise an exception if it receives more than 30 redirects, which usually happens in a distributed environment (at least in our setup it happens).
But tbh, I’m not sure if it will be enough to simply increase the number of allowed redirects in the ansible module (sorry, had no time to try this so far), at least in a small test script it worked.

Sadly I’ve not enouth time to do more tests at the moment…

Regards,
Christian

It looks like the whole “redirect” topic is not straight forward, so whomever can, please open an issue over at GitHub, so we can track the issue there.

Thanks everyone for the constructive discussion here!