CMK version: 2.2 unknown patch OS version: RHEL variant, unknown precise version
Error message: (408) Request Timeout
I am working to help a client who is having issues with the API. They are a huge organization with more than 12 distributed sites. Two of the primary tasks they are using the API for is to put a host in downtime and remove downtime automatically via startup and shutdown scripts on the given hosts. Secondly, they are using the API to run tabula rasa anytime a host has a CheckMK Discovery service not in OK state.
They are convinced that the API is rate limiting them, but I cannot find any documentation on what the API rate limit might be, or where to configure it.
After some additional conversation, I found out it’s not the tabula rasa scripts encountering this error. This client states that between 100 and 200 servers per minute reach out to the API to either set or cancel host downtimes as these systems start and stop. They are asking if there is any “bottleneck with the Apache server instances” and whether increasing the MaxClients variable in the site’s apache.conf would help.
A bit of a “me too” here. We also have a fairly large multisite environment and we are automatically creating downtimes during releases and other maintenance work. These may be at a similar rate of 100 to 200 per minute at peak. They are all using service_by_query or host_by_query. The client is logging the 204 (empty) responses from these, but sometimes the downtime just never appears (no history either), resulting in significant alert and notification noise. We are running 2.3.0p4 Enterprise at this time. We also had this happen when we were running 2.1.0p38 but at the time it was attributed to another process beating on the system - that factor was removed.
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.