HTTP Response Time Graph Does not reflect "Socket time out" intervals

Hi,

I wonder why the response time graph of the HTTP active check does not show zero value when the endpoint stops responding.

  1. From 13:15 to 13:30 HTTP check returned “Socket time-out”

  2. But the response time graph remains showing a time response equal to 7,39 ms

This is confusing, the graph says that there is still response from the other side while the event log says the opposite.

Is that normal behaviour of the graphing module?

BR

Curious myself, added Services: Service Metrics as column to the view.

All the response times in Summary don’t match with the times in Metrics. Almost as if they measure different things. The size seems to match though.

I had the same question some time ago: Missing data points No answer so far… I consider this a bug, the graph does not represent what happened at all.

Please, Can someone from CheckMK tell us why the graphs maintain the last value when no data points are being collected? This does not reflect the real situation. If no data points are being collected, the graph should be empty (zero value) for the time interval without data.

BR.

This behavior has to do with the heartbeat feature of RRDtools.
The heartbeat give a time span that is needed before RRDtool is showing zero values. Keep in mind that at a “Socket timeout” as it is shown in the first screen no value for the performance data is written.
No value does not mean 0 but really no value.

From the page RRDtool - Example - rrdtool update

interval
RRDtool will normalize the rates it computed (or got from you, in case of a GAUGE data source type) into what you specified as the step size. But you don't need to update this often if you don't want to.

You need to understand the heartbeat value. This value determines if an update is fresh enough, see rrdtool create if you missed it.

If your heartbeat value allows it, you can span many steps with one single update:

rrdtool update 1235775600:0
rrdtool update 1235818800:1
The time difference is 43200 seconds. If heartbeat is at least that number, then all time slots between 1235775600 and 1235818800 are filled with rate 1. Else, the same time slots will be unknown.

This page http://rrdtool.vandenbogaerdt.nl/ in general is a very good source for in depth information about RRD structure and how it behaves.

If you change the heartbeat of your RRD files then you can also see shorter intervals with missing data in your graph.

The default value looks like this.

rrd_version = "0003"
step = 60
last_update = 1731493351
header_size = 6448
ds[1].index = 0
ds[1].type = "GAUGE"
ds[1].minimal_heartbeat = 8460
ds[1].min = NaN
ds[1].max = NaN
ds[1].last_ds = "21215"
ds[1].value = 6.5766500000e+05
ds[1].unknown_sec = 0
ds[2].index = 1
ds[2].type = "GAUGE"
ds[2].minimal_heartbeat = 8460
ds[2].min = NaN
ds[2].max = NaN
ds[2].last_ds = "0.000020883"
ds[2].value = 6.4737300000e-04

8460 seconds → 141 minutes → if there is one entry every 140 minutes you will see a line in your graph
You see that it is possible to define heartbeats for every data source.

Conclusion - all the shown behavior is completely normal for default RRD configuration.

Hi Andreas,

Thank you so much for explanation. Its clear for me now.

BR.

Do you also have an explanation why the response times in Summary differ from Metrics?

Where can I configure the heartbeat? Especially when using the Checkmk Appliance? Are there rules for this, or do I have to edit any files?

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.