Dynamic host management Error after upgrade to 2.1.0

CMK version:
2.1.0p2

OS version:
docker on Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-117-generic x86_64)

Error message:
[Dynamic host management]
Phase 1.1: Executing on site “swarm01” Step has been finished OK 23.5 ms 2022-06-14 21:46:26
Phase 1.1: Fetching from site “swarm01” Step has been finished OK 99.2 ms 2022-06-14 21:46:33
Phase 2.1: Extracting result Step has been finished OK 5.96 µs 2022-06-14 21:46:33
Phase 2.2: Fetching existing hosts Step has been finished OK 774 ms 2022-06-14 21:46:34
Phase 2.3: Updating config An exception occured ERROR 502 ms 2022-06-14 21:46:35
21:46:35 ERROR An exception occured
Traceback (most recent call last):
File “/omd/sites/cmk/lib/python3/cmk/cee/dcd/connectors/piggyback.py”, line 240, in _execute_phase2
deleted_host_names = self._delete_vanished_hosts(
File “/omd/sites/cmk/lib/python3/cmk/cee/dcd/connectors/piggyback.py”, line 555, in _delete_vanished_hosts
self._web_api.delete_hosts(hosts_to_delete)
File “/omd/sites/cmk/lib/python3/cmk/cee/dcd/web_api.py”, line 239, in delete_hosts
resp.raise_for_status()
File “/omd/sites/cmk/lib/python3.9/site-packages/requests/models.py”, line 960, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:5000/cmk/check_mk/api/1.0/domain-types/host_config/actions/bulk-delete/invoke

This most times results in one host being deleted but the change not applied.
Any Idea where that Internal Server Error could be logged?

Regards
Sintbert

Error is still persisting, everytime it has multiple hosts to remove i get that error.
Has anyone an idea where i could find the logfile to that 500 Server Error?

Regards Sintbert

I think about two locations for the error messages.
First one is “~/var/log/web.log” and the second “~/var/log/apache/…”

@andreas-doehler Thanks alot. I have found the log.


2022-06-17 08:16:11,086 [40] [cmk.gui.wsgi.rest_api 2541732] Unhandled exception (Crash-ID: bf2d9f9a-ee15-11ec-90d4-02420a001310)
Traceback (most recent call last):
  File "/omd/sites/cmk/lib/python3/cmk/gui/wsgi/applications/rest_api.py", line 483, in _wsgi_app
    return wsgi_app(environ, start_response)
  File "/omd/sites/cmk/lib/python3/cmk/gui/wsgi/applications/rest_api.py", line 240, in __call__
    wsgi_app = self.endpoint.wrapped(ParameterDict(path_args))
  File "/omd/sites/cmk/lib/python3/cmk/gui/plugins/openapi/restful_objects/decorators.py", line 814, in _wrapper
    response = func(param)
  File "/omd/sites/cmk/lib/python3/cmk/gui/plugins/openapi/restful_objects/decorators.py", line 698, in _validating_wrapper
    response = self.func(_params)
  File "/omd/sites/cmk/lib/python3/cmk/gui/plugins/openapi/endpoints/host_config.py", line 508, in bulk_delete
    host.folder().delete_hosts([host.name()])
  File "/omd/sites/cmk/lib/python3/cmk/gui/watolib/hosts_and_folders.py", line 2670, in delete_hosts
    self._delete_host_files(host_names)
  File "/omd/sites/cmk/lib/python3/cmk/gui/watolib/hosts_and_folders.py", line 2707, in _delete_host_files
    delete_hosts(
  File "/omd/sites/cmk/lib/python3/cmk/gui/watolib/check_mk_automations.py", line 244, in delete_hosts
    _automation_serialized(
  File "/omd/sites/cmk/lib/python3/cmk/gui/watolib/check_mk_automations.py", line 76, in _automation_serialized
    serialized_result=check_mk_remote_automation_serialized(
  File "/omd/sites/cmk/lib/python3/cmk/gui/watolib/automations.py", line 198, in check_mk_remote_automation_serialized
    sync_changes_before_remote_automation(site_id)
  File "/omd/sites/cmk/lib/python3/cmk/gui/watolib/automations.py", line 236, in sync_changes_before_remote_automation
    manager.start([site_id], activate_foreign=True, prevent_activate=True)
  File "/omd/sites/cmk/lib/python3/cmk/gui/watolib/activate_changes.py", line 618, in start
    self._site_snapshot_settings = self._get_site_snapshot_settings(
  File "/omd/sites/cmk/lib/python3/cmk/gui/watolib/activate_changes.py", line 847, in _get_site_snapshot_settings
    site_status = self._get_site_status(site_id, site_config)[0]
  File "/omd/sites/cmk/lib/python3/cmk/gui/watolib/activate_changes.py", line 451, in _get_site_status
    site_status = sites_states().get(site_id, SiteStatus({}))
  File "/omd/sites/cmk/lib/python3/cmk/gui/sites.py", line 78, in states
    _ensure_connected(user, force_authuser)
  File "/omd/sites/cmk/lib/python3/cmk/gui/sites.py", line 186, in _ensure_connected
    _set_livestatus_auth(user, force_authuser)
  File "/omd/sites/cmk/lib/python3/cmk/gui/sites.py", line 354, in _set_livestatus_auth
    user_id = _livestatus_auth_user(user, force_authuser)
  File "/omd/sites/cmk/lib/python3/cmk/gui/sites.py", line 374, in _livestatus_auth_user
    if not user.may("general.see_all"):
  File "/omd/sites/cmk/lib/python3/cmk/gui/utils/logged_in.py", line 369, in may
    raise PermissionError(
PermissionError: Required permissions not declared for this endpoint.
Endpoint: <Endpoint cmk.gui.plugins.openapi.endpoints.host_config:bulk_delete>
Permission: general.see_all
Used permission: {'general.see_all', 'wato.manage_hosts', 'wato.all_folders'}
Declared: AllPerm([{wato.manage_hosts}, {wato.all_folders}?)

now the question is, what do i do with this…
It seems that it has the required permission but not declared it for this.

Regards
Sintbert

Hi,
we have exactly the same problem. Did you solve the issue?

Michael

Hi Michael

Sadly i have only a temporary fix. Which is to delete all the failed/shutdown containers in checkmk. It will then work as long as you don’t have more than one failed/shutdown container at once.

I have no idea how to fix that permission error that is the cause of this.

Regards
Sintbert

So, the Problem seems to be here:

checkmk/cmk/gui/plugins/openapi/endpoints/host_config.py

PERMISSIONS = permissions.AllPerm(
    [
        permissions.Perm("wato.edit"),
        permissions.Perm("wato.manage_hosts"),
        permissions.Optional(permissions.Perm("wato.all_folders")),
    ]
)

....

@Endpoint(
    constructors.domain_type_action_href("host_config", "bulk-delete"),
    ".../delete",
    method="post",
    request_schema=request_schemas.BulkDeleteHost,
    permissions_required=PERMISSIONS,
    output_empty=True,
)
def bulk_delete(params):
    """Bulk delete hosts"""
    user.need_permission("wato.edit")
    body = params["body"]
    for host_name in body["entries"]:
        host = Host.load_host(host_name)
        host.folder().delete_hosts([host.name()], automation=delete_hosts)
    return Response(status=204)

That PERMISSIONS would need the addition of “general.see_all”

Regards
Sintbert

Thanks! We’ve opened a support ticket for that today. It is really disappointing to pay support credits for bugs within checkmk source code but we need to have that fixed. We didn’t have the issue with the 2.1.0 betas. We already sent a mail to checkmk feedback one week ago but didn’t get any answer.
Edit: I’ve just seen that we got an answer to the ticket: Issue is already known internally.

Michael

Ah, thanks too.
I sent an email to feedback@checkmk.com with this thread earlier today.
It opened a Jira Issue: FEED-7131: Dynamic host management Error after upgrade to 2.1.0

Unfortunately not fixed with 2.1.0p3

Seems to have been fixed with 2.1.0p4 :grinning:

PERMISSIONS = permissions.AllPerm(
    [
        permissions.Perm("wato.edit"),
        permissions.Perm("wato.manage_hosts"),
        permissions.Optional(permissions.Perm("wato.all_folders")),
        permissions.Ignore(
            permissions.AnyPerm(
                [
                    permissions.Perm("bi.see_all"),
                    permissions.Perm("general.see_all"),
                    permissions.Perm("mkeventd.seeall"),
                ]
            )
        ),
    ]
)
1 Like

Just upgraded from 2.0p26 to 2.1p9 and have the same error in Dynamic host management. RHEL 7 with selinux disable.
What could be?

10:29:56 ERROR An exception occured
Traceback (most recent call last):
  File "/omd/sites/mysite/lib/python3/cmk/cee/dcd/connectors/piggyback.py", line 229, in _execute_phase2
    cmk_hosts = self._web_api.get_all_hosts()
  File "/omd/sites/mysite/lib/python3/cmk/cee/dcd/web_api.py", line 225, in get_all_hosts
    resp.raise_for_status()
  File "/omd/sites/mysite/lib/python3.9/site-packages/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: INTERNAL SERVER ERROR for url: http://localhost:5000/mysite/check_mk/api/1.0/domain-types/host_config/collections/all

Please check your web.log, only there will you see the source of the Internal Server Error.

HI!
Unfortunately web.log has nothing interesting inside.
In ~/var/log/apache/access_log I can see that:

- - - [01/Aug/2022:10:00:17 +0200] "GET /mysite/check_mk/api/1.0/domain-types/host_config/collections/all HTTP/1.1" 500 5049662 "-" "python-requests/2.27.0"

The error in GUI I can see in dcd.log as well.

Hi, same error here. Upgraded from 2.0.0p17 to 2.1.0p9, CEE.
Checked host_config.py and PERMISSIONS part does have “general.see_all” as it was suggested to add.

Does anyone have a working solution/workaround/info on update?

Please check if you have corresponding errors in the APIs logfile.
If you use docker then it could be in some folder like /var/lib/docker/volumes/containername/_data/cmk/var/log/web.log

No, unfortunately nothing in the web.log. Exactly the same error, symptoms as DanielDS has.
Does anyone know if CMK teams is aware of this error as is working on an update?

Hello, did you find solution? I have same error after ugprade from 2.0.0p28 to 2.1.0p15, CEE on CMA.

Thank you for an asnwer

Michal

from dcd.log:
2022-11-01 14:48:36,501 [40] [cmk.dcd.connection_XX] Trace:
Traceback (most recent call last):
File “/omd/sites/icinga1/lib/python3/cmk/cee/dcd/connectors/utils.py”, line 173, in execute
self._execute_sync()
File “/omd/sites/icinga1/lib/python3/cmk/cee/dcd/connectors/utils.py”, line 227, in _execute_sync
self._execute_phase2(phase1_result)
File “/omd/sites/icinga1/lib/python3/cmk/cee/dcd/connectors/piggyback.py”, line 229, in _execute_phase2
cmk_hosts = self._web_api.get_all_hosts()
File “/omd/sites/icinga1/lib/python3/cmk/cee/dcd/web_api.py”, line 225, in get_all_hosts
resp.raise_for_status()
File “/omd/sites/icinga1/local/lib/python3/requests/models.py”, line 960, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: INTERNAL SERVER ERROR for url: http://localhost:5002/icinga1/check_mk/api/1.0/domain-types/host_config/collections/all

You should find the detailed error message to your 500 Server error in the web.log

Only there will you find the root cause of the error.

It seems, that is little bit different issue.

In apache error log there is no record, just a access log:
[01/Nov/2022:15:15:40 +0100] “GET /icinga1/check_mk/api/1.0/domain-types/host_config/collections/all HTTP/1.1” 500 7627506 “-” “python-requests/2.27.1”
and there is no more details…and also any record for that case in web.log

I found this CheckMK KB Debug DCD issues - Checkmk Knowledge Base - Checkmk Knowledge Base about it.
Ddoes any one exactly know how to set up workaround : “A workaround would be to remove all custom values and run the DCD again. This issue is already reported internally.”