Metrics with livestatus

gera83 · August 4, 2020, 10:03pm

I’m sorry, is not clear.
I love Linux, i love scripting, i love bash.

But, what is this?

OMD[prod2]:~$ for i in $(lq “GET services\nFilter: host_name = ESX1\nFilter: service_description = Uptime\nColumns: rrddata:m1:uptime.max,1,*:1593883300:1596568904:1\nOutputFormat: json” | tr “,” " "); do echo $i; done | sort -n
[[[1593878400
null
null]]]
1.00586e+06
1.01303e+06
1.02024e+06
1.02744e+06
1.03465e+06
1.04183e+06
1.04904e+06
1.05624e+06
1.06344e+06
1.07065e+06
1.07783e+06
1.08504e+06
1.09224e+06
1.09944e+06
1.10665e+06
1.11386e+06
1.12103e+06
1.12824e+06
1.13544e+06
1.14265e+06
1.14983e+06
1.15704e+06
1.16424e+06
1.17144e+06
1.17865e+06
1.18583e+06
1.19303e+06
1.20024e+06
1.20744e+06
1.21464e+06
1.22183e+06
1.22903e+06
1.23624e+06
1.24344e+06
1.25065e+06
1.25783e+06
1.26503e+06
1.27224e+06
1.27944e+06
1.28664e+06
1.29386e+06
1.30103e+06
1.30824e+06
1.31544e+06
1.32264e+06
1.32983e+06
1.33703e+06
1.34424e+06
1.35144e+06
1.35864e+06
1.36587e+06
1.37303e+06
1.38024e+06
1.38744e+06
1.39464e+06
1.4018e+06
1.40903e+06
1.41624e+06
1.42344e+06
1.43064e+06
1.43786e+06
1.44503e+06
1.45224e+06
1.45944e+06
1.46664e+06
1.47383e+06
1.48103e+06
1.48824e+06
1.49544e+06
1.50264e+06
1.50983e+06
1.51703e+06
2.46866e+07
2.46938e+07
2.4701e+07
2.47082e+07
2.47154e+07
2.47226e+07
2.47298e+07
2.4737e+07
2.47442e+07
2.47514e+07
2.47586e+07
2.47658e+07
2.4773e+07
2.47802e+07
2.47874e+07
2.47946e+07
2.48018e+07
2.4809e+07
2.48162e+07
2.48234e+07
2.48306e+07
2.48378e+07
2.4845e+07
2.48522e+07
2.48594e+07
2.48666e+07
2.48738e+07
2.4881e+07
2.48882e+07
2.48954e+07
2.49026e+07
2.49098e+07
2.4917e+07
2.49242e+07
2.49314e+07
2.49386e+07
2.49458e+07
2.4953e+07
2.49602e+07
2.49674e+07
2.49724e+07
4096
7200
11311
12240
18507
19455
25704
26652
32901
33848
40097
41045
47293
48242
54503.3
55438
61707
62655
68904
69851
76100
77045
83297
84239
90494
91442
97696.7
98648
104908
105853
112104
113040
119298
120245
126495
127451
133692
134666
140908
141842
148103
149048
155300
156254
162497
163440
169693
170645
176909
177852
184105
185037
191302
192243
198498
199449
205694
206654
212891
213841
220108
221045
227302
228250
234498
235435
241694
242642
248930
249849
256105
257041
263303
264241
270499
271446
277695
278653
284891
285839
292107
293045
299304
300251
306499
307437
313694
314642
320889
321848
328104
329047
335301
336239
342497
343444
349693
350650
356918
357834
364106
365040
371302
372246
378498
379452
385692
386637
392895
393843
400104
401049
407300
408235
414495
415441
421691
422645
428900
429851
436102
437037
443297
444242
450494
451446
457690
458649
464907
465848
472103
473036
479297
480241
486493
487444
493728
494644
500904
501849
508101
509062
515296
516238
522491
523443
529688
530647
536903
537852
544100
545036
551296
552241
558493
559446
565690
566650
572905
573834
580100
581038
587296
588243
594493
595448
601688
602639
608904
609837
616100
617041
623296
624244
630491
631448
637687
638632
644904
645837
652100
653041
659295
660245
666492
667450
673688
674634
680905
681838
688100
689042
695297
696247
702492
703412
709716
710636
716902
717840
724099
725044
731295
732249
738490
739433
745686
746638
752903
753842
760098
761045
767293
768243
774489
775434
781704
782639
788900
789844
796095
797048
803292
804232
810488
811436
817702
818641
824896
825846
832092
833050
839288
840235
846503
847440
853699
854644
860894
861849
862214
869032
876237
883441
890646
897800
905034
912238
919442
926646
933860
941034
948239
955441
962645
969800
977034
984238
991442
998647
1596571200

That’s MHZ from the last 30 days?
Why uptime?

Sorry, i did not understand.

Thanks!!!

Dirk · August 5, 2020, 7:27am

I read the question as “Why don’t I get MHz when I ask for Uptime?” but maybe I’m wrong

The query

lq "GET services
Filter: host_name = ESX1
Filter: service_description = Uptime
Columns: rrddata:m1:uptime.max,1,*:1593883300:1596568904:1
OutputFormat: json"

fetches the performance data of

host ESX1 and
service Uptime between
Sat Jul 4 19:21:40 CEST 2020 (1593883300) and
Tue Aug 4 21:21:44 CEST 2020 (1596568904)

If you want to fetch the MHz, then you obviously have to pick another service – preferably one that shows MHz values, maybe ESX CPU?

The code snippet you refer to and given here by @andreas-doehler was just an example of how to get such historical data in general.

gera83 · August 6, 2020, 5:26pm

Hi there. Yes, ok. It was just in case.
No secrets. Ok.

So… to business.

OMD[prod2]:~/var/check_mk/rrd/ESX1$ ls | grep -i cpu
CPU_utilization.info
CPU_utilization.rrd

Query:
GET services
Filter: host_name = ESX1
Filter: service_description = CPU utilization
Columns: rrddata:m1:CPU_utilization.max,1,*:$(date +%s -d"a month ago"):$(date +%s):1"

I get: 0,0,0

What am i doing wrong?

Dirk · August 6, 2020, 5:50pm

You are using wrong performance data. You want to query a certain performance counter of a given service. In your case the service is CPU utilization and that service has (at least) 6 different performance counters. You can see them either by doing

OMD[stest]:~/var/check_mk/rrd/localhost$ cat CPU_utilization.info
HOST localhost
SERVICE CPU utilization
METRICS wait;guest;user;steal;system;util;cpu_core_util_3;cpu_core_util_2;cpu_core_util_0;cpu_core_util_1

or by looking in the GUI (service’s detail page):

Pick the performance counter you are intrested in (e.g. user) and then:

OMD[stest]:~/var/check_mk/rrd/localhost$ lq "GET services
Filter: host_name = localhost
Filter: service_description = CPU utilization
Columns: rrddata:m1:user.max,1,*:1593883300:1596568904:1
OutputFormat: json"
[[[1593878400,1596571200,7200, ..., 38.1785,27.5616,...]]]

gera83 · August 6, 2020, 5:53pm

OK.

OMD[prod2]:~/var/check_mk/rrd/ESX1$ grep METRICS CPU_utilization.info
METRICS util

GET services
Filter: host_name = ESX1
Filter: service_description = CPU utilization
Columns: rrddata:m1:util.max,1,*:$(date +%s -d"a month ago"):$(date +%s):1"

Now i’m getting info.

So, can you confirm 4 things?

grep METRICS xxxx.info, and i get the rrd attribute
Values are: max, min and average (not avg)
What’s the difference between:

for i in $(lq “GET services\nFilter: host_name = ESX1\nFilter: service_description ~ CPU utilization\nColumns: rrddata:m1:util.average,1,*:$(date +%s -d"a month ago”):$(date +%s):1" | sed -e ‘s/,/ /g’); do echo $i; done | sort -n | grep “.” |tail -1

And…

GET services
Filter: description = CPU utilization
Filter: host_name = ESX1
Filter: time > $(date +%s -d"a month ago")
Filter: time = $(date +%s)
Stats: sum perf_data

With the first (rrddata column), I’m getting a lot of data, not the maximum value, not the min, not the avg. I guess im getting maximum values, same for min and average. What’s the logic of that? i’m asking for the maximum value in the last 30 days.

Thanks for your time!

andreas-doehler · August 6, 2020, 9:17pm

What you get with the rrddata query is the raw data and nothing else. Now you parse this many values you get back and take the maximum value.
Please don’t mix the maximum from the rrd query with the maximum value over a specific time period.
The rrd min/max/avg are only the aggregated values over a given number of datapoints and a specific time frame.
In your case i would only use the max function to get my rrd data and then only search the maximum value in the result. If you want to build a sum over the last month then it is better to use the avg to fetch the rrd values and then add all values together.

To understand all this a little bit better, the tutorial about RRDs from Alex van den Bogaerdt. The parts about consolidation and resampling are nice to read.
RRDtool - rrdtutorial or http://rrdtool.vandenbogaerdt.nl/

system · September 6, 2020, 7:17am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.