Dynamic host management Error after upgrade to 2.1.0

checkFK · November 3, 2022, 12:00pm

Hi.

I can confirm that the error still exists in 2.1p14.
We updated from 1.6p29 to 2.0pX to 2.1p14.
Tbh I don’t know if the Dynamic host management worked in 2.0 release, because we did not test it.

The only log entry I can find regarding this issue is the apache/access.log.

- - [03/Nov/2022:12:40:38 +0100] “GET /SITENAME/check_mk/api/1.0/domain-types/host_config/collections/all HTTP/1.1” 500 23751736 “-” “python-requests/2.28.1”

If I go to the URL, I got the following:

I dont have an idea how I can check all the *.mk files by hand if I dont know for what I need to look?
Next question I have is, is this 500 error related to the errors regarding the tag “type_of_network_switche”?

Thanks for the help.

EDIT:
In the dcd.log.1 I find the output from UI.

2022-11-03 12:56:58,126 [40] [cmk.dcd.aulc_dynamic_esxi] Error during sync: 500 Server Error: INTERNAL SERVER ERROR for url: http://localhost:5000/SITENAME/check_mk/api/1.0/domain-types/host_config/collections/all
2022-11-03 12:56:58,126 [40] [cmk.dcd.aulc_dynamic_esxi] Trace:
Traceback (most recent call last):
  File "/omd/sites/SITENAME/lib/python3/cmk/cee/dcd/connectors/utils.py", line 173, in execute
    self._execute_sync()
  File "/omd/sites/SITENAME/lib/python3/cmk/cee/dcd/connectors/utils.py", line 227, in _execute_sync
    self._execute_phase2(phase1_result)
  File "/omd/sites/SITENAME/lib/python3/cmk/cee/dcd/connectors/piggyback.py", line 229, in _execute_phase2
    cmk_hosts = self._web_api.get_all_hosts()
  File "/omd/sites/SITENAME/lib/python3/cmk/cee/dcd/web_api.py", line 225, in get_all_hosts
    resp.raise_for_status()
  File "/omd/sites/SITENAME/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: INTERNAL SERVER ERROR for url: http://localhost:5000/SITENAME/check_mk/api/1.0/domain-types/host_config/collections/all

Casper · November 3, 2022, 2:14pm

Not much to add here except that I’m having the exact same issue. Issue started right after updating from 2.0 to 2.1. Was working great on 2.0. Currently running 2.1.0p14 (enterprise edition).

edit:
We’re using DCD in combination with the AWS special agent to fetch our EC2 instance status.

15:15:14 ERROR An exception occured
Traceback (most recent call last):
  File "/omd/sites/icinga/lib/python3/cmk/cee/dcd/connectors/piggyback.py", line 229, in _execute_phase2
    cmk_hosts = self._web_api.get_all_hosts()
  File "/omd/sites/icinga/lib/python3/cmk/cee/dcd/web_api.py", line 225, in get_all_hosts
    resp.raise_for_status()
  File "/omd/sites/icinga/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: INTERNAL SERVER ERROR for url: http://localhost:5000/icinga/check_mk/api/1.0/domain-types/host_config/collections/all

athomaidis · November 5, 2022, 4:38am

Hi Casper,

two things which could help us to narrow down the issue:

Can you go to ~/etc/check_mk/conf.d/wato/ and grep for the value “Switche”? I’m curious how many hosts are using this attribute. Btw. Is this a custom tag?

Pleae run on the master site cmk-update-config -vv

If dcd is still failing, we need to get the API endpoint working. Maybe you can share the hosts.mk where this attribute is used?

Best regards
Anastasios

Casper · November 7, 2022, 8:16am

Thank you for your help Athomaidis.

I grepped for “Switch” and “Switche” in the location you specified. There are a few hits, but not in the folder that holds the hosts where we use DCD for. So I don’t think our issue is related to a switch attribute.

We use the AWS special agent with piggyback data to get our EC2 instances in CheckMK. With DCD we prefix the hostsname so we know their function by looking at the name.
The hosts names contain information like IP-address and AWS region that I’d rather not share here publicly.
Is it possible to share the hosts.mk with you privately?

Kind regards,
Casper

checkFK · November 7, 2022, 10:14am

Hi - Thanks for your answer.

Yes, it is a custom tag.
I need to clean the output provided by the grep command. Do you search for anything more specific than “Switche”?

I ran the cmk-update-config -vv command on the master, but the dcd is still failing.

Update:
With the following output, I get empty output.
find etc/check_mk/conf.d/wato/ -name '*.mk' -exec cat {} \; | grep "Switche"

With find etc/check_mk/conf.d/wato/ -name '*.mk' -exec cat {} \; | grep -i Switche I get maybe all hosts we have (7k).

checkFK · November 10, 2022, 7:56am

Is there anything that I can do to solve this problem?

chauhan_sudhir · November 20, 2022, 3:11pm

Have you tried accessing this URL “http://localhost:5000/SITENAME/check_mk/api/1.0/domain-types/host_config/collections/all” directly ?
This should tell you more about the data about the hosts and also if there is a problem with the data.

Also, do you use Distributed monitoring ?

checkFK · November 21, 2022, 4:00pm

chauhan_sudhir:

checkFK:
requests.exceptions.HTTPError: 500 Server Error: INTERNAL SERVER ERROR for url: http://localhost:5000/SITENAME/check_mk/api/1.0/domain-types/host_config/collections/all
Have you tried accessing this URL “http://localhost:5000/SITENAME/check_mk/api/1.0/domain-types/host_config/collections/all” directly ?
This should tell you more about the data about the hosts and also if there is a problem with the data.

Also, do you use Distributed monitoring ?

What do you mean with directly?
I modified the URL like this:

https://checkmk.DOMAIN/SITENAME/check_mk/api/1.0/domain-types/host_config/collections/all

[Changes are in Bold]

You can see a part of the output in my screenshots.

Yes we are using Distributed monitoring.
CEE 2.1p14 on CentOS7.9.

chauhan_sudhir · November 23, 2022, 9:33am

Is any of your slave sites disabled?

checkFK · November 23, 2022, 10:01am

No, we have 5 Satellites and 1 Primary Server. All sites are online and Distributed Monitoring works fine.

Not sure if it is related to this issue. When we logged in to our primary site via UI, we didn’t need to login to the other satellites manually. Since the update from 1.6. over 2.0 to 2.1 we need to login on every site.
We use LDAP for login.

e.g. this is our primary
https://checkmk.DOMAIN/PRIMARY/check_mk/

Before the update I was authenticated to the satellites as well.
e.g. this is a satelltie
https://checkmk.DOMAIN/SATELLITE/check_mk/

At the moment, the login is needed to be done on every site (primary and satellite).

jwiederh · November 23, 2022, 10:59am

Hi, did you fix the error-message in the access-log regarding “Switche” from your first screenshot?

If not maybe that would be a good place to start troubleshooting.

chauhan_sudhir · November 23, 2022, 2:32pm

In the Url, it should be the hostname where the DCD is executed against.

Just paste the Url in the browser

checkFK · November 23, 2022, 5:40pm

We got it.
It would be helpful to know where to search for the URL: We finally found the URLs with the hosts which has used a non existing tag. When we solved the “Switch” tag issue, we got a further one with a non existing tag.

Thanks for helping out.

It seems like that this werk doesnt fixed it complete, is it possible?
Tags: Fix "Element "" does not exist anymore in tag input preview (checkmk.com)

chauhan_sudhir · November 23, 2022, 11:25pm

Afaik, the werk talks about fixing the tag in the tag UI and not here. If you enable the DCD debugging (raise log_level to debug) from the global setting, the there is still a chance to see the exact error message or you can access the URL directlly to see what is the actual problem. Normally it appears in the Json response from the Url.

If all good at your end, then I will recommend to mark this as solved so that others can also benefit from this.

system · November 23, 2023, 11:26pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.