Latency on version 2.x

CMK version: 2.0.0p11 (CRE)
OS version: Centos 7.9
We have setup a cluster with distributed monitoring central site, which connects to about 14 other sites.

Total host count is 2000 and services 86K, we are seeing unusual latency while searching / opening any host or services.

Are there any recommendations for such setup to reduce the latency and performance tuning.

It is possible that the problem mentioned here is also your problem.

The fix in the thread is very simple to implement i think and the fix only needs to be done on the master.

1 Like

Trying it now! Thank you

@andreas-doehler I tried fiddling around with TLS encrypt however didnt see much of difference. I have tried to capture the pages taking time to load

/check_mk/view.py?_show_filter_form=0&filled_in=filter&host=XXXXXXX&view_name=host
/check_mk/sidebar_snapin.py?names=tactical_overview&since=1642149172&_ajaxid=1642152170
/check_mk/view.py?filled_in=filter&host=XXXX&view_name=host&_display_options=htbfcoderuw&_do_actions=&_ajaxid=1642152337

They seems to have upto 20 sec of “waiting” time. Any idea how this can be reduced, the hardare is a VM with 24 CPU / 32 G RAM.
image

Also, what do you recommend for swapping ?

If it is waiting then this means the remote site where your host data is, is not responding quickly.
You can test if this happens for hosts from all your sites.

Swapping is bad on all server systems :smiley:

Good that we are thinking same for swapping part, i have tested it from local site too. Seems to be more or less same results

Will check on other sites and get back to you

@andreas-doehler This seems to be common to multiple sites, i see high disk usage for the process rrdcached

We had this site created from a backup 1.4 and upgraded to version 2.x

rrdcached seems to be working async writing to disk, we are using drbd to create a HA for the site which also may be contributing to some latency.

Do you recommend changing the rrdcached params from below setting
image

That’s normal. The name says already what it does, it caches the writes for RRD data and write out all data after a specific time or if requested.

This behavior should not impact your slow response problem.

If you access the web interface on a single slave, then there must be no latency problem.