i hope somebody could help me with the following problem.
I am running omd (v 0.57.20130515) and check_mk 1.2.3i1 with a nagios core. My main site got 500 hosts and 17000 services. While nagios works fine, checks are running, notifications were sent, the multisite and the whole apache becomes unresponsive from time to time. The only problem the solve that for the moment is to run “omd restart” .
Any idea on how to fix this?
···
–
Mathias Decker
IT User-Support / Monitoring / Windows- and Linux-Server
Are you using LDAP/AD for authentication? One reason can be that LDAP/AD
are taking time to connect. Second, if your LDAP/AD returns back huge
amount of data, I've seen the apache becoming unresponsive.
You may edit /etc/httpd/conf/httpd.conf and lower MaxRequestsPerChild value
to something like...
MaxRequestsPerChild 400
See if this helps.
ShiB.
while ( ! ( succeed = try() ) );
···
On Tue, May 21, 2013 at 2:46 PM, Decker, Mathias < mathias.decker@mdc-berlin.de> wrote:
Hello list,****
** **
i hope somebody could help me with the following problem.****
I am running omd (v 0.57.20130515) and check_mk 1.2.3i1 with a nagios
core. My main site got 500 hosts and 17000 services. While nagios works
fine, checks are running, notifications were sent, the multisite and the
whole apache becomes unresponsive from time to time. The only problem the
solve that for the moment is to run “omd restart” .****
** **
Any idea on how to fix this?****
** **
--****
Mathias Decker****
IT User-Support / Monitoring / Windows- and Linux-Server****
i hope somebody could help me with the following problem.
I am running omd (v 0.57.20130515) and check_mk 1.2.3i1 with a nagios core. My main site got 500 hosts and 17000 services. While nagios works fine, checks are running, notifications were sent, the multisite and the whole apache becomes unresponsive from time to time. The only problem the solve that for the moment is to run “omd restart” .
Any idea on how to fix this?
–
Mathias Decker
IT User-Support / Monitoring / Windows- and Linux-Server
Are you using LDAP/AD for authentication? One reason can be that LDAP/AD are taking time to connect. Second, if your LDAP/AD returns back huge amount of data, I’ve seen the apache becoming unresponsive.
You may edit /etc/httpd/conf/httpd.conf and lower MaxRequestsPerChild value to something like…
i hope somebody could help me with the following problem.
I am running omd (v 0.57.20130515) and check_mk 1.2.3i1 with a nagios core. My main site got 500 hosts and 17000 services. While nagios works fine, checks are running, notifications were sent, the multisite and the whole apache becomes unresponsive from time to time. The only problem the solve that for the moment is to run “omd restart” .
Any idea on how to fix this?
–
Mathias Decker
IT User-Support / Monitoring / Windows- and Linux-Server
Are you using LDAP/AD for authentication? One reason can be that LDAP/AD are taking time to connect. Second, if your LDAP/AD returns back huge amount of data, I’ve seen the apache becoming unresponsive.
You may edit /etc/httpd/conf/httpd.conf and lower MaxRequestsPerChild value to something like…
i hope somebody could help me with the following problem.
I am running omd (v 0.57.20130515) and check_mk 1.2.3i1 with a nagios core. My main site got 500 hosts and 17000 services. While nagios works fine, checks are running, notifications were sent, the multisite and the whole apache becomes unresponsive from time to time. The only problem the solve that for the moment is to run “omd restart” .
Any idea on how to fix this?
–
Mathias Decker
IT User-Support / Monitoring / Windows- and Linux-Server
Are you using LDAP/AD for authentication? One reason can be that LDAP/AD are taking time to connect. Second, if your LDAP/AD returns back huge amount of data, I’ve seen the apache becoming unresponsive.
You may edit /etc/httpd/conf/httpd.conf and lower MaxRequestsPerChild value to something like…
i hope somebody could help me with the following problem.
I am running omd (v 0.57.20130515) and check_mk 1.2.3i1 with a nagios core. My main site got 500 hosts and 17000 services. While nagios works fine, checks are running, notifications were sent, the multisite and the whole apache becomes unresponsive from time to time. The only problem the solve that for the moment is to run “omd restart” .
Any idea on how to fix this?
–
Mathias Decker
IT User-Support / Monitoring / Windows- and Linux-Server
Are you using LDAP/AD for authentication? One reason can be that LDAP/AD are taking time to connect. Second, if your LDAP/AD returns back huge amount of data, I’ve seen the apache becoming unresponsive.
You may edit /etc/httpd/conf/httpd.conf and lower MaxRequestsPerChild value to something like…
i hope somebody could help me with the following problem.
I am running omd (v 0.57.20130515) and check_mk 1.2.3i1 with a nagios core. My main site got 500 hosts and 17000 services. While nagios works fine, checks are running, notifications were sent, the multisite and the whole apache becomes unresponsive from time to time. The only problem the solve that for the moment is to run “omd restart” .
Any idea on how to fix this?
–
Mathias Decker
IT User-Support / Monitoring / Windows- and Linux-Server
Are you using LDAP/AD for authentication? One reason can be that LDAP/AD are taking time to connect. Second, if your LDAP/AD returns back huge amount of data, I’ve seen the apache becoming unresponsive.
You may edit /etc/httpd/conf/httpd.conf and lower MaxRequestsPerChild value to something like…
Initially we had the same issue once we passed beyond 300/400 nodes orso.
It seemed to occur more often with IE browsers and less with FireFox/Chrome, but did happen from time to time. Restarting apache seemed to fix it for a few hours.
After telling Check_MK to not use buffered HTTP streams (global options menu) our issues completely went away.
Regards,
Nico
···
----- Original Message -----
From: Decker, Mathias
[mailto:mathias.decker@mdc-berlin.de]
To:
checkmk-en@lists.mathias-kettner.de
[mailto:checkmk-en@lists.mathias-kettner.de]
Sent: Tue, 21 May 2013 11:16:46
+0200
Subject: [Check_mk (english)] unresponsive apache in omd
Hello list,
i hope somebody could help me with the following problem.
I am running omd (v 0.57.20130515) and check_mk 1.2.3i1 with a nagios core.
My main site got 500 hosts and 17000 services. While nagios works fine,
checks are running, notifications were sent, the multisite and the whole
apache becomes unresponsive from time to time. The only problem the solve
that for the moment is to run "omd restart" .
Any idea on how to fix this?
--
Mathias Decker
IT User-Support / Monitoring / Windows- and Linux-Server
a crucial point is also the number of livestatus threads. In the default this is set to 20. But especially if you are using
persistent connections, this can be too low. You set this
option where you load livestatus (please refer to the online
docu).
Mathias
···
Am 22.05.2013 22:48, schrieb Nico van Royen:
Hi Mathias,
Initially we had the same issue once we passed beyond 300/400 nodes orso.
It seemed to occur more often with IE browsers and less with FireFox/Chrome, but did happen from time to time. Restarting apache seemed to fix it for a few hours.
After telling Check_MK to not use buffered HTTP streams (global options menu) our issues completely went away.
Regards,
Nico
----- Original Message -----
From: Decker, Mathias
[mailto:mathias.decker@mdc-berlin.de]
To:
checkmk-en@lists.mathias-kettner.de
[mailto:checkmk-en@lists.mathias-kettner.de]
Sent: Tue, 21 May 2013 11:16:46
+0200
Subject: [Check_mk (english)] unresponsive apache in omd
Hello list,
i hope somebody could help me with the following problem.
I am running omd (v 0.57.20130515) and check_mk 1.2.3i1 with a nagios core.
My main site got 500 hosts and 17000 services. While nagios works fine,
checks are running, notifications were sent, the multisite and the whole
apache becomes unresponsive from time to time. The only problem the solve
that for the moment is to run "omd restart" .
Any idea on how to fix this?
--
Mathias Decker
IT User-Support / Monitoring / Windows- and Linux-Server
a crucial point is also the number of livestatus threads. In the default this is set to 20. But especially if you are using
persistent connections, this can be too low. You set this
option where you load livestatus (please refer to the online
docu).
Mathias
Am 22.05.2013 22:48, schrieb Nico van Royen:
Hi Mathias,
Initially we had the same issue once we passed beyond 300/400 nodes orso.
It seemed to occur more often with IE browsers and less with FireFox/Chrome, but did happen from time to time. Restarting apache seemed to fix it for a few hours.
After telling Check_MK to not use buffered HTTP streams (global options menu) our issues completely went away.
Dont know – my problem is on the multisite system which connects to the othere multisites via livestatus. Livestatus isn’t enabled on my main site AFAIK
a crucial point is also the number of livestatus threads. In the default this is set to 20. But especially if you are using
persistent connections, this can be too low. You set this
option where you load livestatus (please refer to the online
docu).
Mathias
Am 22.05.2013 22:48, schrieb Nico van Royen:
Hi Mathias,
Initially we had the same issue once we passed beyond 300/400 nodes orso.
It seemed to occur more often with IE browsers and less with FireFox/Chrome, but did happen from time to time. Restarting apache seemed to fix it for a few hours.
After telling Check_MK to not use buffered HTTP streams (global options menu) our issues completely went away.
i hope somebody could help me with the following problem.
I am running omd (v 0.57.20130515) and check_mk 1.2.3i1 with a nagios core.
My main site got 500 hosts and 17000 services. While nagios works fine,
checks are running, notifications were sent, the multisite and the whole
apache becomes unresponsive from time to time. The only problem the solve
that for the moment is to run “omd restart” .
Any idea on how to fix this?
–
Mathias Decker
IT User-Support / Monitoring / Windows- and Linux-Server
We also get this issue - however its not MK/OMD that’s is the issue ( in 99% of the cases) - Version is 1.2.0.p2
Setup: ( not your standard setup)
1 master site ( with 1 host - We can not use remote hosts on local as we have no direct connection for this - there are ssh tunnels setup, so local host monitoring will not work, as all the tunnels are on the same external server which is always up and running
20 omd Slave sites
20 links between them sites
Issue that we have seen, -
Master site will time out when a slave site is unable to write to disk - as the SAN is so busy doing other VM stuff.
At this point we connect to slave site and do a omd stop, then get on the master and then set it to disable, then on slave do a omd start, just so that site has monitoring.
Also if we restart the master OMD this looks like it fixes the issue but it is not until we disable the slave site that is not working correctly is everything happy.
If Distributed Monitoring supported a port number under host status - that would help solve our issue.
Dont know – my problem is on the multisite system which connects to the othere multisites via livestatus. Livestatus isn’t enabled on my main site AFAIK
a crucial point is also the number of livestatus threads. In the default this is set to 20. But especially if you are using
persistent connections, this can be too low. You set this
option where you load livestatus (please refer to the online
docu).
Mathias
Am 22.05.2013 22:48, schrieb Nico van Royen:
Hi Mathias,
Initially we had the same issue once we passed beyond 300/400 nodes orso.
It seemed to occur more often with IE browsers and less with FireFox/Chrome, but did happen from time to time. Restarting apache seemed to fix it for a few hours.
After telling Check_MK to not use buffered HTTP streams (global options menu) our issues completely went away.
Sent: Tue, 21 May 2013 11:16:46
+0200
Subject: [Check_mk (english)] unresponsive apache in omd
Hello list,
i hope somebody could help me with the following problem.
I am running omd (v 0.57.20130515) and check_mk 1.2.3i1 with a nagios core.
My main site got 500 hosts and 17000 services. While nagios works fine,
checks are running, notifications were sent, the multisite and the whole
apache becomes unresponsive from time to time. The only problem the solve
that for the moment is to run “omd restart” .
Any idea on how to fix this?
–
Mathias Decker
IT User-Support / Monitoring / Windows- and Linux-Server