Getting historical data for unscheduled downtimes via LQL

Hello

We use checkmk 1.2.6p16 to monitor quite a few hosts and services. (An upgrade to the latest release is in planning… about time!).

For a monthly report, we’d like to see historical data on unscheduled downtimes for a number of hosts and services. By reading checkmk documentation, it’s clear we should be using GET statehist for this purpose. However, there doesn’t seem to be a lot of detailed documentation and/or examples on the usage of statehist.

What we want to know is the hostname and service, date of the downtime, start of the downtime, end of the downtime and duration of the downtime. (The last item can also be calculated from the preceding two, of course).

We already have a working Python script that asks the user for a start date and end date so we can give a range, for example: python reporting_downtimes.py -s 2020-06-01 -e 2020-06-30.

What we’re having trouble with is getting the actual query right so as to get the data as mentioned above. Is this something that’s possible with Livestatus Query Language? If so, do you have any tips on accomplishing this?

Thanks a lot in advance and of course, let me know if I can provide more detail for clarification on anything above.

BTW, I would say the basis for this query could be a configured report of an Availability view in the checkmk Web UI. Does checkmk web UI itself use LQL under the hood to fetch its data? In that case I could just use the same query the checkmk frontend uses to populate itself to get what I need…

The Web UI uses also LQL this assumption is correct.

This has to be fetched from the downtimes table.

   $ lq "GET downtimes\nColumns: host_name service_description author comment entry_time end_time \nFilter: service_description = Filesystem abc\nFilter: host_name = foobar\nColumnHeaders: on"

        "author": "marco.domain",
        "comment": "Problem with firmware updates",
        "end_time": 1596189300,
        "entry_time": 1593590192,
        "duration": 2599200,
        "host_name": "foobar",
        "service_description": "Filesystem abc"

The above mentioned columns and also the ones in your request cannot be queried from the statehist table.

You can also verify this by seeing the columns of this table statehist using the below query: (This will explain the description of each column and type of data stored)
lq "GET columns\nFilter: table = statehist"|less

2 Likes

Hi marco

Many thanks for your answer. One thing that does not quite add up here is the time range. I’m trying to gather information on unscheduled downtimes over a period of time, mostly for the previous month. The way my Python code is structured I can start the script as follows:

python reporting_downtimes.py -s 2020-06-01 -e 2020-06-30

This way I’m trying to get info on all downtimes that occurred in June 2020. end_time and entry_time, however, are properties of a single downtime from the downtimes table, if I understand correctly.

Also, “time” is a property of the statehist table that can be queried but is not available in the downtimes table. Any idea on this?

Thanks!

Hey andreas

Cool, that’s good to know. Any way to “grab” whatever the web UI generates in LQL to send to the backend? Would be very interesting for reuse in my own script :slight_smile:

Anyone have an idea on how to get around this problem? Thank you!

The function used to get the availability data inside the web interface can be found in the file “~/lib/python/cmk/gui/availability.py” --> get_availability_rawdata
There you can see how the query is build.

1 Like