Charts performance tuning for Checkmk V2 CRE

We have an open query related to charts under Charts in Checkmk v2

While that is being reviewed, i am trying to debug the latency being observed in navigating though the charts on central site connecting sites in other locations.

Use case is , users login to central site and need to access charts for services located in other sites

I am seeing a consistent latency of 5 Second in waiting time on the below page call

https://myfakehost//central/check_mk/ajax_graph_hover.py?_ajaxid=1643886485
image

If i understand it correctly , whenever there is a movement of mouse on the chart a fresh set of parameters is downloaded and the image is adjusted accordingly in the chart section.

My question is, can we optimize this response by adjusting / fine tuning the caching somewhere ?

Current wait time is letting the experience down as it take far too long to visually see the change on chart.

Any suggestions will be much appreciated

As you are using the raw edition, there is a problem with encrypted livestatus connections and a high latency.
Here Web Interface slow when Livestatus TLS Encrypt. enabled in distributed monitoring - #10 by decker
You find a quick workaround this latency problem.

If your livestatus is not encrypted i don’t know where to start first.

Thanks @andreas-doehler , giving it a shot.

I have disabled the TLS in distributed monitoring however not able to connect to the other site
image
The port seems to be listing on the remote server

never mind, forgot to turn it off in omd config :slight_smile: , will review the performance and get back to you

unfortunately the TLS didnt help much, any ideas if the charts are some way cached or if any tuning can be done to optimize these ?

That is more a question for the Munich team directly. In my distributed setups i had no such big problems.

Are your sites in different continents too ?

Could you point us to the team / resource who may be able to help in this regards. We have been struggling to get the charts working for some time and the cutover coming up soon

There seems to be an interesting change here, the Version 1 seems to open the graphs in the local site and Version 2 seems to open it in the same site ( This would add to the physical distance between two sites and increase the time multi-fold )

That only looks so. Also v1 pulls the graph data from the remote site. But the big difference is the how it fetches the data. Raw edition v1 uses PNP4Nagios and fetches the graph as an rendered PNG. That means only the PNG is transferred from your remote site to the local web server.
v2 uses the enterprise graphing system. This fetches the graph data as ajax request in form of a json data structure.
The problem now is that if you move the mouse over the graph and click you fire the next ajax request and fetch the graph data again. These are easily 100 requests if you scroll a little bit inside the graph.
Here it is a real problem (enterprise and raw) in the direction of @elias.voelker or @sebkir

Single load of graph page is ok


Only a little bit moving inside one graph

With this it is clear if you access graphs over continents then it will not work.
Grafana as example only reloads the data after you submit the new timeframe.
Moving the mouse over the graph produce no traffic this is also different with CMK graphing.

2 Likes

Thank you for the detailed explanation