Weird graph/metric behaviour upon making changes

CMK version: Checkmk Raw Edition 2.1.0p20
OS version: Ubuntu Jammy

Hi team, I’ve defined a custom plugin following the basic tutorial, as well as some metrics (metric_info) that correspond to the ‘yield Metric()’ name, and graphs (graph_info) that include said metrics in an attempt to combine some metrics into a single graph.

My issue is that the metrics still each show an individual graph (default behaviour) and I’m wondering what I might have missed. Does the lib/check_mk/gui/plugins/metrics/ perhaps need to be reloaded somehow?

Thanks in advance.

-Phil

Perhaps it helps to see the service, metric and graph definitions:

Service:

from .agent_based_api.v1 import *

def discover_oracle_mem(section):
    yield Service()

def check_oracle_mem(section):
    for line in section:
        if str(line[0]) == "total_processes":
            yield Metric("oracle_sga_total_processes", float(line[1]))
            yield Result(state=State.OK, summary=f"{line[0]} is {float(line[1])}")
        elif line[0] == "w3wp_processes":
            yield Metric("oracle_sga_w3wp_processes", float(line[1]))
            yield Result(state=State.OK, summary=f"{line[0]} is {float(line[1])}")
        elif line[0] == "other_processes":
            yield Metric("oracle_sga_nonw3wp_processes", float(line[1]))
            yield Result(state=State.OK, summary=f"{line[0]} is {float(line[1])}")
        elif line[0] == "total_mbsize":
            yield Metric("oracle_sga_total_mbsize", float(line[1]))
            yield Result(state = State.OK, summary = f"{line[0]} is {render.bytes(float(line[1])*1048576)}")
        elif line[0] == "w3wp_mbsize":
            yield Metric("oracle_sga_w3wp_mbsize", float(line[1]))
            yield Result(state = State.OK, summary = f"{line[0]} is {render.bytes(float(line[1])*1048576)}")
        elif line[0] == "other_mbsize":
            yield Metric("oracle_sga_nonw3wp_mbsize", float(line[1]))
            yield Result(state = State.OK, summary = f"{line[0]} is {render.bytes(float(line[1])*1048576)}")

register.check_plugin(
        name = "check_oracle_mem",
        service_name = "Oracle PGA-SGA",
        discovery_function = discover_oracle_mem,
        check_function = check_oracle_mem,
)

Metric and graph:

from cmk.gui.i18n import _
from cmk.gui.plugins.metrics.utils import graph_info, metric_info

metric_info["oracle_sga_total_processes"] = {
    "title": _("SGA Total Processes"),
    "unit": "count",
    "color": "#80f000",
}

metric_info["oracle_sga_w3wp_processes"] = {
    "title": _("SGA w3wp Processes"),
    "unit": "count",
    "color": "#80f000",
}

metric_info["oracle_sga_nonw3wp_processes"] = {
    "title": _("SGA non-w3wp Processes"),
    "unit": "count",
    "color": "#80f000",
}

metric_info["oracle_sga_total_mbsize"] = {
    "title": _("SGA Total Size"),
    "unit": "bytes",
    "color": "#80f000",
}

metric_info["oracle_sga_w3wp_mbsize"] = {
    "title": _("SGA w3wp Size"),
    "unit": "bytes",
    "color": "#80f000",
}

metric_info["oracle_sga_nonw3wp_mbsize"] = {
    "title": _("SGA non-w3wp Size"),
    "unit": "bytes",
    "color": "#80f000",
}

graph_info["oracle_sga_process_count"] = {
    "title": _("Process Count in Oracle SGA"),
    "metrics": [
        ("oracle_sga_total_processes", "area"),
        ("oracle_sga_w3wp_processes", "line"),
        ("oracle_sga_nonw3wp_processes", "line"),
    ],
}

graph_info["oracle_sga_process_size"] = {
    "title": _("Process Consumption in Oracle SGA"),
    "metrics": [
        ("oracle_sga_total_mbsize", "area"),
        ("oracle_sga_w3wp_mbsize", "line"),
        ("oracle_sga_nonw3wp_mbsize", "line"),
    ],
}
1 Like

Hey team, checking in after pulling hair out all day over this. So it appears my config above was perfect and did work… EVENTUALLY. Finally discovered this after going to show a colleague at the end of the day, only to find combined metrics in a graph!

Then, the really weird bit: when the service screen auto-refreshed, the graphs returned to non-combined! Another refresh, and another I see the ‘combined metrics’ changes display intermittently.

Has anyone experienced this behaviour before?
I have executed cmk -R without any improvement.

$ cmk -R
Generating configuration for core (type nagios)...
Precompiling host checks...OK
Validating Nagios configuration...OK
Restarting monitoring core...OK
$

OK, solved eventually - posting for everyone elses benefit.
To (reliably (because graphs intermittently exhibit as defined in metrics)) make changes to ‘metrics’ active, you need to:

omd restart apache

Enjoy!

1 Like