CMK version: 2.0.0p11 (CRE)
OS version: Centos 7.9
We have setup a cluster with distributed monitoring central site, which connects to about 14 other sites.
Total host count is 2000 and services 86K, we are seeing unusual latency while searching / opening any host or services.
Are there any recommendations for such setup to reduce the latency and performance tuning.
It is possible that the problem mentioned here is also your problem.
Hi,
we are running checkmk 2.0p17 cre with an encrypted livestatus connection, too. The gui is really slow. After we disabled tls encryption for all livestatus queries everything is working fast and responsive again.
Before we had a 1.6CRE up and running in a distributed environment with tls enabled for livestatus queries and had no slowdown for the web ui.
bye
David
The fix in the thread is very simple to implement i think and the fix only needs to be done on the master.
1 Like
@andreas-doehler I tried fiddling around with TLS encrypt however didnt see much of difference. I have tried to capture the pages taking time to load
/check_mk/view.py?_show_filter_form=0&filled_in=filter&host=XXXXXXX&view_name=host
/check_mk/sidebar_snapin.py?names=tactical_overview&since=1642149172&_ajaxid=1642152170
/check_mk/view.py?filled_in=filter&host=XXXX&view_name=host&_display_options=htbfcoderuw&_do_actions=&_ajaxid=1642152337
They seems to have upto 20 sec of “waiting” time. Any idea how this can be reduced, the hardare is a VM with 24 CPU / 32 G RAM.
Also, what do you recommend for swapping ?
If it is waiting then this means the remote site where your host data is, is not responding quickly.
You can test if this happens for hosts from all your sites.
Swapping is bad on all server systems
Good that we are thinking same for swapping part, i have tested it from local site too. Seems to be more or less same results
Will check on other sites and get back to you
@andreas-doehler This seems to be common to multiple sites, i see high disk usage for the process rrdcached
We had this site created from a backup 1.4 and upgraded to version 2.x
rrdcached seems to be working async writing to disk, we are using drbd to create a HA for the site which also may be contributing to some latency.
Do you recommend changing the rrdcached params from below setting
That’s normal. The name says already what it does, it caches the writes for RRD data and write out all data after a specific time or if requested.
This behavior should not impact your slow response problem.
If you access the web interface on a single slave, then there must be no latency problem.
system
(system)
Closed
January 14, 2023, 7:34pm
9
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.