API query limits

CMK version: 2.2 unknown patch
OS version: RHEL variant, unknown precise version

Error message: (408) Request Timeout

I am working to help a client who is having issues with the API. They are a huge organization with more than 12 distributed sites. Two of the primary tasks they are using the API for is to put a host in downtime and remove downtime automatically via startup and shutdown scripts on the given hosts. Secondly, they are using the API to run tabula rasa anytime a host has a CheckMK Discovery service not in OK state.

They are convinced that the API is rate limiting them, but I cannot find any documentation on what the API rate limit might be, or where to configure it.

Anyone have any guidance here?

I don’t think you have a rate limit here.
Only the very time consuming

Why? That makes no sense.

That is how the customer wishes to operate. They don’t want to have to deal with discovery scans.

I am hoping to find some official documentation. This client will respond better to that.

It’s kinda hard to officially document something that doesn’t exist…

@marcel.arentz do you want to provide an ‘official’ statement about whether or not there are rate limits in the Checkmk REST API?

@briand I assume you mean the REST API.

Yes, the REST API. If there is no official documentation, that’s fine. I have attempted to get their answer and heard from the development team.

Thank you!

After some additional conversation, I found out it’s not the tabula rasa scripts encountering this error. This client states that between 100 and 200 servers per minute reach out to the API to either set or cancel host downtimes as these systems start and stop. They are asking if there is any “bottleneck with the Apache server instances” and whether increasing the MaxClients variable in the site’s apache.conf would help.

Any thoughts?

A bit of a “me too” here. We also have a fairly large multisite environment and we are automatically creating downtimes during releases and other maintenance work. These may be at a similar rate of 100 to 200 per minute at peak. They are all using service_by_query or host_by_query. The client is logging the 204 (empty) responses from these, but sometimes the downtime just never appears (no history either), resulting in significant alert and notification noise. We are running 2.3.0p4 Enterprise at this time. We also had this happen when we were running 2.1.0p38 but at the time it was attributed to another process beating on the system - that factor was removed.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.