Datasource agent for prometheus node exporter

mike1098 · September 21, 2022, 11:35am

Hello community,

I am currently working on a monitoring for our new EDI System, BISv6.7 for Seeburger.
I was told by the Seeburger consultant that they implemented an exporter for Prometheus and this would be their preferred way of monitoring.
I don’t want to install a Prometheus instance just for this.
As the node exporter is reachable via HTTP/HTTPS protocol and the output seems to be in a structure that could be parsed, I am wondering if not already someone tried to develop a Datasource Agent to acquire the data from a node exporter.
At least at the know places I found nothing useful.

Thanks for your hints/thoughts on this.

regards

Michael

tosch · September 21, 2022, 11:43am

Am i wrong, but the documentation explicit talks about a connection to a node exporter?
As from your description and the documentation it seems exactly like what are you looking for.
(linked the german version )

mike1098 · September 21, 2022, 11:59am

Thanks tosch, as far as I understood this, it connects to the Prometheus server and not to the node itself. The config option just define the initial data source.
Nevertheless I will have a look at the code if I find something useful.

tosch · September 21, 2022, 12:06pm

Maybe @martin.hirschvogel could give you some hints here. He seems to be the expert on this topic.

martin.hirschvogel · September 21, 2022, 2:24pm

@mike1098 mentioned it already correctly. There is no native connection to a node exporter.
However, if you look into the new Kubernetes special agent, you will see methods how to parse Prometheus metrics. Because that’s what is happening there massively.

r.sander · September 21, 2022, 6:12pm

Isn’t the OpenMetrics spec used by Prometheus fundamentally different to things checkmk needs?

AFAIK you only get metrics but no semantics. You would need to implement a check logic but the metrics presented may be too dynamic.

mike1098 · September 22, 2022, 2:33pm

May you can explain this for a poor schoolboy like me
If I do a curl to the node exporter I get mainly key/value pairs

# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.562352e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 1.562352e+06
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.445247e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 570

For some we see also items in the key:

# TYPE node_cpu_scaling_frequency_max_hertz gauge
node_cpu_scaling_frequency_max_hertz{cpu="0"} 2.8e+09
node_cpu_scaling_frequency_max_hertz{cpu="1"} 2.8e+09
node_cpu_scaling_frequency_max_hertz{cpu="2"} 2.8e+09
node_cpu_scaling_frequency_max_hertz{cpu="3"} 2.8e+09
# HELP node_cpu_scaling_frequency_min_hertz Minimum scaled CPU thread frequency in hertz.
# TYPE node_cpu_scaling_frequency_min_hertz gauge
node_cpu_scaling_frequency_min_hertz{cpu="0"} 1.2e+09
node_cpu_scaling_frequency_min_hertz{cpu="1"} 1.2e+09
node_cpu_scaling_frequency_min_hertz{cpu="2"} 1.2e+09
node_cpu_scaling_frequency_min_hertz{cpu="3"} 1.2e+09

Looks to me as pretty good parsable but maybe I overlocked something

r.sander · September 22, 2022, 3:18pm

Correct, these are just metrics. There are no semantics like thresholds for WARN or CRIT or any other correlations between the single metric values.

You would need to write a check plugin that implements the specific check logic for the collection of metrics you get from this specific exporter.

I do not think that there can be a generic solution to just connect to any exporter and get anything useful out of it.

In Prometheus you also have to form queries to get alarms, am I right?

martin.hirschvogel · September 23, 2022, 8:15am

@r.sander Yes, examples for alerts are Awesome Prometheus alerts | Collection of alerting rules

By the way, I think the metrics you get from the Prometheus exporters are pretty much identical how Prometheus stores them. So actually the Prometheus special agent is the better source.
Example: checkmk/agent_prometheus.py at master · tribe29/checkmk · GitHub

WoutR · March 31, 2023, 7:25am

Did you create a custom datasource program for the prometheus output?

I have a similar case where we have a Sonatype Nexus application which has an API for metics. They serve a JSON format as well as a prometheus format, both are not supported right now bij CheckMK.
Reference:

The prometheus format seems easier since CheckMK would already know how to handle this data format by the node export functionality, except that it needs some customization to handle the data directly.

mike1098 · March 31, 2023, 8:36am

Hi, This project was recently restarted, so stay tuned.

WoutR · April 3, 2023, 10:23am

I have encountered even more applicable applications that would benefit from being able to handle prometheus exporter data:

Sonatype Nexus
Jenkins with Prometheus plugin
SonarQube
GitLab
RabbitMQ with rabbitmq_prometheus plugin
KeyCloak

This is useful when there is no possibility to install a CheckMK agent, but another benefit of using the prometheus data is that there are also many application specific checks in the prometheus data, for instance a license expiry check in SonarQube.

WoutR · April 4, 2023, 12:54pm

There is already a feature request for Prometheus integration, but that may take some time, so I will definitely stay stuned on any update from your side.

Also make sure you upvote the feature request:

LaSoe · August 31, 2023, 3:55pm

@martin.hirschvogel
We observe that more and more applications and solutions offer their metrics in Prometheus format and there are also plenty of exporters out there to monitor all kinds of things.

I am convinced that a generic Prometheus metrics parser (similar mk_jolokia) would be a big benefit for checkmk, because then we could monitor more or less anything that offers its metrics in Prometheus format without needing Prometheus at all.

When a team starts using Prometheus because Checkmk couldn’t provide a simple solution for querying their metrics, the question quickly arises, why maintain two monitoring solutions when you can monitor everything with Prometheus?

martin.hirschvogel · August 31, 2023, 4:22pm

This is on the radar and on top of my personal wishes for Checkmk 2.4 (however, rather in the context with OpenTelemetry).

LaSoe · September 1, 2023, 5:05pm

I am afraid that by 2.4 (2025) most of our customers who have applications with prometheus metrics will have already switched to Prometheus and will not return to Checkmk.

martin.hirschvogel · September 2, 2023, 6:57am

Having seen many people switch from Prometheus to Checkmk as well, because PromQL is terrible if you don’t spend the entire day on it, I am not entirely convinced, but I also understand the urgency.
A backport to 2.3 should be reasonable and address some of your concerns.
Keep in mind, that on the other side we have to ensure that Checkmk Core functionality is well maintained, which currently includes a rewrite of check_http, MS SQL monitoring, NetApp monitoring, developer APIs (so that MKPs don’t break on updates as easily anymore), adaptations to livestatus for building viewd/graphs/dashboards indepedent of instance size.
All things, I prefer not to strike off the list…

Doc · January 25, 2024, 8:38am

The sniclient only covers basic measurements. These are already part of the Checkmk agent (ok, not where the agent could not be installed).

nice:
go_goroutines 7

not so nice:
go_gc_duration_seconds{quantile=“0.75”} 0.000276884

uuhh:
node_filesystem_files{device=“zroot/ROOT/default”,fstype=“zfs”,mountpoint=“/”} 1.069231e+07

I think it would be much more useful in cases where an exporter provides data where we do not have a plugin for the agent.
I’m dreaming about a solution where on can enter a keyword (a line in the exporter data) and a unit and a field how to calculate things with the given value.
This might be not useful when JSON data or whatever is proided by the exporter. But…
Next dream:
If the community would have a kind of interface somewhere (github?) where these pairs (keyword, unit, command) can be entered, we could build up a fast growing database which can be used in a Checkmk datasource for exporters.
Is that understandable?

Why not start with the easy ones?

marbaa · July 3, 2024, 12:05pm

Direct access to node exporter from Checkmk would be handy for me also. In my case, I’m developing monitoring for Nvidia Nucleus Enterprise Server, which is running in containers and have integrated exporter for Prometheus metrics. For PoC I set up Prometheus server as middle-man between Checkmk and Nucleus, so I can get metrics from exporter. However I will write custom plugins (also to have graphs in Checkmk) for particular metrics, because I don’t want to run Prometheus just for accessing exporter.

foobar · July 3, 2024, 12:41pm

@marbaa
on the conference they mentioned that there will be something coming… lets see