Hey @annbe ![]()
This is a known regression in 2.3.0 affecting distributed setups with parallel activations ā youāre not alone: Bug: activation often hangs since 2.3.0
Quick cleanup when itās stuck (as @andreas-doehler suggested
):
ls -lah ~/var/check_mk/background_jobs/ | grep activate
rm -R ~/var/check_mk/background_jobs/activate-changes-scheduler-<UUID>/
rm -R ~/tmp/check_mk/wato/activation/<old-UUID>/
Full procedure: https://checkmk.atlassian.net/wiki/spaces/KB/pages/9470533/
Diagnose the root cause (picking up @mimimi hints
):
# Watch what happens during activation on central:
tail -f ~/var/log/apache/access_log
# Check if ~/local is unexpectedly large (slow sync = timeout):
du -sh ~/local
# Test config generation speed on each remote:
time cmk -U
Workaround until this is fixed in a patch: ![]()
Since master + one slave works fine, a simple sequential REST API activation script would bridge the gap:
#!/bin/bash
# Sequential site activation via Checkmk REST API
# Usage: ./activate_sites.sh
HOST_NAME="<your-central-site>"
SITE_NAME="<your-site-name>"
USERNAME="automation"
PASSWORD="<automation-secret>"
API_URL="https://$HOST_NAME/$SITE_NAME/check_mk/api/1.0"
SITES=("slave1" "slave2" "slave3")
for SITE in "${SITES[@]}"; do
echo ">>> Activating site: $SITE ..."
RESPONSE=$(curl -s -o /tmp/cmk_response.json -w "%{http_code}" \
--request POST \
--header "Authorization: Bearer $USERNAME $PASSWORD" \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data "{\"sites\": [\"$SITE\"], \"force_foreign_changes\": false}" \
"$API_URL/domain-types/activation_run/actions/activate-changes/invoke")
echo "HTTP Status: $RESPONSE"
cat /tmp/cmk_response.json | python3 -m json.tool 2>/dev/null
if [[ "$RESPONSE" -ge 400 ]]; then
echo "!!! ERROR activating $SITE (HTTP $RESPONSE) ā aborting."
exit 1
fi
echo "--- Done with $SITE. Waiting 30s before next site..."
sleep 30
done
echo "=== All sites activated successfully."
Adjust the sleep value depending on how long your sync takes. Not pretty, but it gets you through until thereās a proper fix. ![]()
Hope that helps ā would be great if you could share what access.log and du -sh ~/local show, so we can narrow down whether itās a sync size or a true concurrency bug!
Perhaps also interesting for you @ mbunkus
Greetz Bernd

