We use a self written python API client where we call the API endpoints:
We have a workflow where some virtual machines are setup automatically and added into CheckMK. All actions that need to be done in CheckMK are queued into a Task-Queue, which is then processed in batches.
Below is a testing replica of the original logic to give you a picture how the functions are called. The calls list in this case resembles the Task-Queue.
calls = [
{
'method': api.create_host,
'params': {
'agent_type': 'cmk-agent',
'host_type': '<type>',
'hostname': '<hostname>',
}
},
{
'method': api.discover_services,
'params': {
'hostname': '<hostname>',
'mode': 'fix_all'
}
}
]
for method in calls:
method['method'](**method['params'])
The method we use to create a host (this always works without problems):
def create_host(.....):
[...]
data = {
'host_name': hostname,
'folder': self.api_folder + '/' + host_type,
'attributes': {
'tag_<company-name>_host_type': host_type,
'tag_agent': agent_type
}
}
r = self._post_data("domain-types/host_config/collections/all", data)
[...]
# logic for setting parents, error handling etc.
The method to discover services for the host created before:
def discover_services(self, hostname, mode='fix_all'):
data = {
"host_name": hostname,
"mode": mode
}
r = self._post_data('domain-types/service_discovery_run/actions/start/invoke', data)
When used like this, the service discovery does not work most of the times. During our testing and debugging the discovery sometimes worked and sometimes it didn’t.
We saw that waiting a little after host creation before starting the service discovery helped in tries, but not always. The results were a bit better, when we used our API client directly from the CLI. But also here it did not work every time.
We also see all the API calls in the apache log and could also see that one call waits until the call before is finished. Also the return codes are always like expected (but the check_table field in the discovery response is empty).
The only way service discovery works reliably is with cmk -Iv <hostname> on the monitoring service directly. This works everytime and we now have implemented a workaround that uses this fact.
In version 1.6 we were using the WebAPI which always worked perfectly fine.
I would really appreciate to get some insight or hints on this issue.
Thank you and best regards