Livestatus query yields values in wrong field for host

Hello,

perhaps something where nobody would ever stumble over…

I’m querying livestatus for the following fields against the statehist table:

duration_part_critical
duration_part_ok
duration_part_unknown
duration_part_unmonitored
duration_part_warning

When the resulting line is for the availability of a host (=service_description is empty) the field “duration_part_warning” is filled instead of “duration_part_critical” for “DOWN”.

Version is 2.3.0p22, but other versions report the value in the same field. When I move the field in my query the values moves to apropriate position in output.

Example:

    def _availability_query(self, start, end) -> str:
        return "\n".join(
            [
                "GET statehist",
                "Columns: host_name service_description",
                f"Filter: time >= {int(start.timestamp())}",
                f"Filter: time < {int(end.timestamp())}",
                "Stats: sum duration_part_ok",
                "Stats: sum duration_part_critical",
                "Stats: sum duration_part_unknown",
                "Stats: sum duration_part_warning",
                "Stats: sum duration_part_unmonitored",
            ]
        )

Results in:

columns: host_name, service_description,   ok,  crit, unkn,  warn,    unmon
row:      ['SW09',        '',            0.9986,  0,   0,  0.00139969,  0]

I suspect a wrong assignment to the calculated stats, because hosts have [0 (UP),1 (DOWN) ,2 (UNKN/UNREACH)] instead of [0 (OK),1 (WARN) ,2 (CRIT), 3 (UNKN)] as states.

Hi @StefanM,

Bug or design?

It is both at the same time – a design error in the naming of the statehist columns. The names duration_part_warning and duration_part_critical are service-centric and do not correspond to their semantic meaning for hosts.

Historical from NAGIOS times.

Workaround:
Separate queries for hosts and services
The cleanest solution is to split the query into two separate queries:
For services (as before, semantics are correct):

GET statehist
Columns: host_name service_description
Filter: service_description != ""   # nur Services
Stats: sum duration_part_ok
Stats: sum duration_part_critical
Stats: sum duration_part_unknown
Stats: sum duration_part_warning
Stats: sum duration_part_unmonitored

For hosts (read fields with correct mapping):

GET statehist
Columns: host_name service_description
Filter: service_description = ""    # nur Hosts
Stats: sum duration_part_ok         # → UP
Stats: sum duration_part_warning    # → DOWN  (state 1)
Stats: sum duration_part_critical   # → UNREACHABLE  (state 2)
Stats: sum duration_part_unknown    # → nicht genutzt
Stats: sum duration_part_unmonitored

rename and map in code:

# Für Host-Zeilen:
host_up          = row[0]   # duration_part_ok
host_down        = row[1]   # duration_part_warning (!)
host_unreachable = row[2]   # duration_part_critical (!)

Alternativ: host_state-Collum directly

Instead of aggregating duration_part_*, you can use the raw state column from statehist and perform the interpretation in the application code itself:

GET statehist
Columns: host_name service_description state duration
Filter: service_description = ""

then in python:
STATE_MAP_HOST = {0: "up", 1: "down", 2: "unreachable", -1: "unmonitored"}

This completely avoids misleading column names.

Greetz Bernd

take a look closer to:

1 Like