We’ve been using the latest 1.6 Release with 260 Hosts and 4700 Service with the Enterprise edition.
With 1.6 everybody was fast and smooth.
Since we upgraded to 2.0 the Web-UI is sometimes really slow.
I cannot see any errors in the log. also not a very high CPU or RAM usage.
So it its 3-5 times slower as before on the UI
There are different possible problems.
For a slowness of the UI the first thing to look is the livestatus connection to the core.
How is this setup in your system?
If it is enterprise then it should use the livestatus proxy also on the local connection.
Here you can take a look at the livestatus status file. If there are all channels use all the time you will get a slow UI.
All channels used can have different sources.
In my case i have an instance acting as hub for four instances doing actually checks and probes. For all instance the livestatus proxy daemon is setup.
Can you name any sources where I may find any parameters to fiddle around with and how to analyse the situation? I didn’t even find anything about a livestatus status file via google
Unfortunately on this side everthing seems to fine; but I’ll take a look next time I expierience any slowness.
Update: Just again have a reproduceable slow UI when accessing notifcation rules. I then disabled the remote-site access with liveproxy and suddenly all performance issues where gone
But I wonder what could be tuned with that liveproxy since it always displayed all channels ready
@righter In the Distributed Monitoring section, when you are in edit connection in the section Status connection you have the option Use Livestatus Proxy Daemon where you can select Connect directly.
do any of you possibly use the dashlets “Host statistics” or “Service statistics”? Though you did state, that you do not see an “increase in CPU”*, perhaps you could still check out, if you are using them. If so, please to removing and re-adding them and report back. There will be a fix for this specific issue in the p10 → Fixed steadily rising CPU due to misconfiguration when cloning builtin dashboards
* It would be interesting to know what you mean exactly. Load or utilization? Additionally, I would like to point to a very interesting article in our official guide, when it comes to monitoring the CPU utilization also in this scenario. Because it might be the case, that only one process is causing a lot of trouble and blocking an entire core, without you noticing, because the 7 - 127 other cores don’t have anything to do and hence the (average) CPU utilization is very low. Check out at least Chapter 1 in the following article. It was an eye opener for me a couple of years ago: Best practices, tips & tricks
In my case I neither see any significant load nor CPU utilisation, no hanging processes.
The phenomenon when using the liveProxyDaemon is that occasionally when you enter a page (eg services of host) the page loads forever since being reloaded once or twice.
I whish I could provide any usefull information as a clue to the cause …
I doubt the stats dashlets are involved in that, since we normally don’t use any of the dashboards.
To me as ordinary person It feels like some sort of broken pipe in the UI
For my case, after trying a lot of things to encircle the problem, it turned out that the reason was some occasional packet-loss on the route to the central-instance wich was causing the slowness.
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.