Help with "Query elasticsearch logs" - CMK/KIBANA results do not match

Hey,

I’m looking for some help building a working search query for the “Query elasticsearch logs” check. At the moment the results are quiet different from that results I’m getting by using the KIBANA Web Gui. (cmk API call: 450.000 - KIBANA Gui: 144).

CMK version: Checkmk Enterprise Edition 2.4.0p21

Josef :slight_smile:

Hey Josef,

I had a look at the plugin source code to understand what query CMK actually sends to Elasticsearch.

checkmk/packages/cmk-plugins/cmk/plugins/elasticsearch at master · Checkmk/checkmk

The actual ES query CMK builds:

{
  "query": {
    "bool": {
      "must": [
        {"query_string": {"query": "<your pattern>"}},
        {"range": {"@timestamp": {"gte": "now-<timerange>s", "lt": "now"}}}
      ]
    }
  }
}

Three things stand out that could explain the difference between 450,000 (CMK) and 144 (Kibana):

1. Timerange is in seconds, not minutes.
The value you enter in the rule is passed directly as now-Xs to Elasticsearch. So 60 = last 60 seconds, 3600 = last hour. What value do you have configured, and what time range are you using in Kibana when you see 144?

2. Without a fieldname, CMK searches across all fields.
Kibana’s KQL typically targets a specific field. CMK’s query_string without a fields restriction searches everything — which can return far more matches than expected.

3. Index scope.
The default index is _all. If Kibana is scoped to a specific index, the counts will differ.

To reproduce the exact query CMK sends and compare directly, you can run:

curl -X GET "https://<es-host>:9200/<index>/_count" \
  -H "Content-Type: application/json" \
  -u user:password \
  -d '{
    "query": {
      "bool": {
        "must": [
          {"query_string": {"query": "<your pattern>"}},
          {"range": {"@timestamp": {"gte": "now-<timerange>s", "lt": "now"}}}
        ]
      }
    }
  }'

Could you share your current rule settings?

  • Index name / pattern
  • Search pattern
  • Fieldname (if set)
  • Timerange value
  • The KQL/Lucene query you use in Kibana

That will let us pinpoint exactly where the difference comes from.

Greetz Bernd

Hey, thanks for the reply.

The configuration within CMK:

Search Pattern: “agent.name : “XXX” and message : “pmaarch” and _index : logs-XXX-XXX

Indices to query: logs-XXX-XXX

Time range: 3 Days

Thats the same values I used in KIBANA. For me it’s hard to understand, I’m not a super Elastic expert. I understand that running a query against the API is different than using KIBANA but it shuld be possible to get the same results otherwise the API is useless (and I don’t think that it is).

The point with the field name is something I don’t understand. All information that are needed to get the results are given in the search pattern. Should I also add a field name like “data_stream.dataset = xxx xxx” even if it is not necessary?

Josef

Hi Josef try this:

1. _index cannot be used inside the search pattern

_index is an Elasticsearch meta-field and cannot be filtered via query_string
it will be silently ignored. Since you already set “Indices to query: logs-XXX-XXX”
in the rule, the index filter is applied at API level. Simply remove it from the
pattern:

agent.name : "XXX" AND message : "pmaarch"

2. The missing fieldname is likely the main cause of the 450k hits

Without a “Field name” set in the rule, Checkmk searches across all fields (_all).
This means it matches documents where your pattern appears anywhere — not just in
the fields you care about. Kibana in contrast applies your KQL query field-specifically.

Since your pattern already contains explicit field references (agent.name : "XXX"),
you can leave the fieldname empty — but the _index part may cause unexpected
behavior. Clean pattern first, then compare.

3. Verify with curl

To confirm the exact query CMK sends:

curl -sk -u user:pass -X POST \
  "https://<es-host>:9200/logs-XXX-XXX/_search" \
  -H "Content-Type: application/json" \
  -d '{
    "query": {"bool": {"must": [
      {"query_string": {"query": "agent.name : \"XXX\" AND message : \"pmaarch\""}},
      {"range": {"@timestamp": {"gte": "now-3d", "lt": "now"}}}
    ]}}
  }'

If this returns 144 results, your rule is correctly set up.

Sources:

Best regards

for this point a seperate explanation:

What “Field name” does

The “Field name” setting maps to the default_field parameter in the Elasticsearch query_string query. It tells ES which field to search in when a search term has no explicit field prefix.

Example without field prefix — here default_field matters:

search pattern: "pmaarch"
→ ES searches only in the field you specified as "Field name"

Example with explicit field prefix — here default_field is irrelevant:

search pattern: message : "pmaarch"
→ ES always searches in `message`, regardless of what "Field name" is set to

For your specific case

Your pattern already uses explicit field references everywhere:

agent.name : "XXX" and message : "pmaarch"

So the “Field name” setting has no effect on these parts of the query. You do NOT
need to add data_stream.dataset as a field name — it won’t change anything since your pattern already specifies all fields explicitly.

The only reason to use “Field name” would be if you searched without field prefixes, e.g. just "pmaarch" — then you’d want to restrict the search to message` to avoid matching that string in unrelated fields.

Summary for your setup

  • Leave “Field name” empty (or set to message as a safety net)
  • Remove _index : logs-XXX-XXX from the pattern (meta-field, not supported in query_string)
  • Your pattern should be: agent.name : "XXX" and message : "pmaarch"
  • Index and timerange are already correctly configured

Source:

Hey,

thank you for your great and detailed reply :slight_smile:

I’ve already tried this. Without the “_index : logs-XXX-XXX” I get less results (better) but the difference is still 32502 (CMK API) vs. 144 KIBANA.

Adding the “Field name: message” does nothing change at all here. Same 32k result.

For me, it’s a bit like rocket science to find the right syntax. I don’t know if a check should be so complicated that it’s impossible to configure it correctly without reverse‑engineering the entire communication (just my opinion).

Josef

:rofl: Yeah but we are on track … :shuffle_tracks_button:

problems like this, if you are correspondent with other systems, api, etc is to reflect how the query’s are working or what have the foreign developer in mind …

do you have tried the curl command above?

I think I have the sollution:

agent.name : “XXX” → 28603 | KIBANA: 28603
agent.name : “XXX” and message : “pmaarch” → 32538 | KIBANA: 144
agent.name : “XXX” AND message : “pmaarch” → 144 | KIBANA: 144

so the operator must be in capital letters.

What’s still confusing me is, I thougt, that a "and” is ignored and so the rest of the query is not use, but that’s not true. The “and” does something but I don’t realy know what leads to the result of 32538.

Anyway .. thank you very much for the great help. I’ll check the syntax with some tests but I think the query is now (working) correct.

Josef

1 Like

Hi Josef,

glad you found it! And good question about the lowercase and — here’s what happened:

In Elasticsearch’s Lucene query_string syntax, boolean operators must be uppercase.
AND, OR, NOT are reserved keywords only when written in capitals.

Lowercase and is treated as a plain search term. So your query:

agent.name : "XXX" and message : "pmaarch"

was interpreted by ES as three separate OR-connected terms:

agent.name:"XXX"  OR  "and"  OR  message:"pmaarch"

The word “and” alone matched a huge number of documents (it appears in log messages
everywhere), which explains the 32,538 hits.

With uppercase AND:

agent.name : "XXX" AND message : "pmaarch"

ES correctly requires both conditions to be true → 144 results, matching Kibana.

Reference:

Bernd

2 Likes