The site in question is running Ubuntu 14.04.3 LTS.
Maybe useful to know that the other remote site (mentioned earlier in this thread, running same configuration) is on Ubuntu 12.04.4 LTS. Possibly different version or configuration of apache by the package maintainers are related to the problem?
Regards,
Paul
···
On 23/09/15 15:11, Marcel Schulte wrote:
Hi Paul,
just a thought flying by... Which OS do you use at remote/slave sites? Is SElinux enabled?
I read something similar last week, see this thread: http://lists.mathias-kettner.de/pipermail/omd-users/2015-September/001328.html
Reagrds,
MarcelPaul Bongers <Paul.Bongers@osudio.com <mailto:Paul.Bongers@osudio.com>> schrieb am Mi., 23. Sep. 2015 um > 15:00 Uhr:
Hi list,
Hoping to see any difference, I tried switching apache mode to shared.
I still get the same errors, but this time I see a python
traceback in the apache error log (with debug enabled).Hopefully this is helpfull on solving the issue.
From the error log:
[Wed Sep 23 12:55:41.293558 2015] [:error] [pid 11025:tid
140673877477120] [client 192.168.99.220:65432
<http://192.168.99.220:65432>] python_handler: Dispatch() returned
non-integer.
[Wed Sep 23 12:55:41.293622 2015] [mpm_event:debug] [pid 11025:tid
140673877477120] event.c(992): (103)Software caused connection
abort: [client 192.168.99.220:65432 <http://192.168.99.220:65432>]
AH00470: network write failure in core output filter
Traceback (most recent call last):
File
"/omd/versions/1.2.6p10.cre/lib/python/mod_python/importer.py",
line 1934, in ReportError
req.write(text)Regards, Paul
On 09/09/15 14:02, Paul Bongers wrote:
Hi Marcel,
All three sites are running Check_MK version 1.2.6p10 and have
(default) omd version 1.2.6p10.cre.$ cmk --version |head -n1
This is check_mk version 1.2.6p10# omd version
OMD - Open Monitoring Distribution Version 1.2.6p10.creI'm currently suspecting that it has something to do with the
apache configuration on the remote site.This is what I found in the apache error log:
[Wed Sep 09 11:23:02.438919 2015] [proxy_http:error] [pid
32174:tid 139895238506240] (104)Connection reset by peer: [client
192.168.99.220:10850 <http://192.168.99.220:10850>] AH01095:
prefetch request body failed to 127.0.0.1:5000
<http://127.0.0.1:5000> (127.0.0.1) from 192.168.99.220 ()The timestamp of this entry matches the timestamp I found in the
access log when the master site is trying to push the configuration.With apaches loglevel increased to debug, I'm seeing this in the
logs:
[Wed Sep 09 11:29:21.597416 2015] [authz_core:debug] [pid
10337:tid 139895447553792] mod_authz_core.c(828): [client
192.168.99.220:63828 <http://192.168.99.220:63828>] AH01628:
authorization result: granted (no directives)
[Wed Sep 09 11:29:21.597473 2015] [proxy:debug] [pid 10337:tid
139895447553792] mod_proxy.c(1104): [client 192.168.99.220:63828
<http://192.168.99.220:63828>] AH01143: Running scheme http
handler (attempt 0)
[Wed Sep 09 11:29:21.597480 2015] [proxy:debug] [pid 10337:tid
139895447553792] proxy_util.c(2020): AH00942: HTTP: has acquired
connection for (127.0.0.1)
[Wed Sep 09 11:29:21.597484 2015] [proxy:debug] [pid 10337:tid
139895447553792] proxy_util.c(2072): [client 192.168.99.220:63828
<http://192.168.99.220:63828>] AH00944: connecting
http://127.0.0.1:5000/<site\_id>/check\_mk/automation\.py?command=push\-snapshot&secret=%3BO%3FX3JG>6CC1%3DSHAMJHI%3FX%3A%40N8B0J>0U&siteid=<site\_id>&mode=slave&restart=yes&debug=
to 127.0.0.1:5000 <http://127.0.0.1:5000>
[Wed Sep 09 11:29:21.597556 2015] [proxy:debug] [pid 10337:tid
139895447553792] proxy_util.c(2206): [client 192.168.99.220:63828
<http://192.168.99.220:63828>] AH00947: connected
/<site_id>/check_mk/automation.py?command=push-snapshot&secret=%3BO%3FX3JG%3E6CC1%3DSHAMJHI%3FX%3A%40N8B0J%3E0U&siteid=<site_id>&mode=slave&restart=yes&debug=
to 127.0.0.1:5000 <http://127.0.0.1:5000>
[Wed Sep 09 11:29:21.597605 2015] [proxy:debug] [pid 10337:tid
139895447553792] proxy_util.c(2610): AH00962: HTTP: connection
complete to 127.0.0.1:5000 <http://127.0.0.1:5000> (127.0.0.1)
[Wed Sep 09 11:29:21.633003 2015] [proxy_http:error] [pid
10337:tid 139895447553792] (104)Connection reset by peer: [client
192.168.99.220:63828 <http://192.168.99.220:63828>] AH01095:
prefetch request body failed to 127.0.0.1:5000
<http://127.0.0.1:5000> (127.0.0.1) from 192.168.99.220 ()
[Wed Sep 09 11:29:21.633019 2015] [proxy:debug] [pid 10337:tid
139895447553792] proxy_util.c(2035): AH00943: HTTP: has released
connection for (127.0.0.1)
[Wed Sep 09 11:29:21.633095 2015] [mpm_event:debug] [pid
10337:tid 139895447553792] event.c(992): (32)Broken pipe: [client
192.168.99.220:63828 <http://192.168.99.220:63828>] AH00470:
network write failure in core output filterA web search resulted in several hits suggesting that mod_proxy
throws error because the file upload (POST data) is too big.Regards, Paul
On 09/09/15 12:19, Marcel Schulte wrote:
Hi Paul,
as already said I have no remote sites... But I read about
version differences causing problems. What version are your
master and slave sites at?* master site
* working local slave
* faulting remote slaveRegards,
MarcelPaul Bongers <Paul.Bongers@osudio.com
<mailto:Paul.Bongers@osudio.com>> schrieb am Mi., 9. Sep. 2015 >>> um 12:09 Uhr:To be able to find more information on what's going wrong, I
added a bit of code to wato.py so that the command used to
push changes to the remote site was displayed in the error.
Then I ran the command from the shell, adding some verbosity:OMD[main]:~$ curl -vv -b /dev/null -L -w " %{http_code}\n"
-s -S -F
snapshot=@/omd/sites/main/tmp/check_mk/sync-<site_id>.tar.gz
"http://<remote_host>/<site_id>/check_mk/automation.py?command=push-snapshot&secret=%3BO%3FX3JG%3E6CC1%3DSHAMJHI%3FX%3A%40N8B0J%3E0U&siteid=<site_id>&mode=slave&restart=yes&debug="2>&1
* Hostname was NOT found in DNS cache
* Trying <ip>...
* Connected to <remote_host> (<ip>) port 80 (#0)
> POST
/<site_id>/check_mk/automation.py?command=push-snapshot&secret=%3BO%3FX3JG%3E6CC1%3DSHAMJHI%3FX%3A%40N8B0J%3E0U&siteid=<site_id>&mode=slave&restart=yes&debug=
HTTP/1.1
> User-Agent: curl/7.35.0
> Host: <remote_host>
> Accept: */*
> Content-Length: 72203
> Expect: 100-continue
> Content-Type: multipart/form-data;
boundary=------------------------66e0b55bb4881b35
>
< HTTP/1.1 100 Continue
* Recv failure: Connection reset by peer
* Closing connection 0100
curl: (56) Recv failure: Connection reset by peerNote that a local slave that is configured exactly the same
way is updated just fine.
What is going wrong here?Regards, Paul
On 08/09/15 14:06, Paul Bongers wrote:
I've opened up port 6557 on the firewall, but I still get
an error when applying changes.
The error message is:Error: HTTP Error - 100: curl: (56)
Recv failure: Connection reset by peerAlso, the remote shows up as dead in WATO, as long as I
have Livestatus TCP disabled.
Changing the connection to 'Connect via TCP' instead of
'Use Livestatus Proxy-Daemon' doesn't change anything.For testing purposes I added another slave, that resides on
the same network as the master. This slave has the same
configuration as the remote one and is configured the same
way on the master server. The local slave works just fine.Therefore, I get the impression that some other port(s)
still need(s) to be opened.What am I missing here?
Configuration of the slave site:
$ omd config show
ADMIN_MAIL:
APACHE_MODE: own
APACHE_TCP_ADDR: 127.0.0.1
APACHE_TCP_PORT: 5000
AUTOSTART: on
CORE: nagios
CRONTAB: on
DEFAULT_GUI: check_mk
DOKUWIKI_AUTH: off
LIVEPROXYD: off
LIVESTATUS_TCP: on
LIVESTATUS_TCP_PORT: 6557
MKEVENTD: off
MKNOTIFYD: on
MULTISITE_AUTHORISATION: on
MULTISITE_COOKIE_AUTH: on
NAGIOS_THEME: classicui
NAGVIS_URLS: auto
NSCA: on
NSCA_TCP_PORT: 5667
PNP4NAGIOS: on
TMPFS: onSlave configuration on the master site (retrieved from
$OMD_HOME/etc/check_mk/liveproxyd.mk <http://liveproxyd.mk>):sites = \
{'site_name': {'cache': True,
'channel_timeout': 3.0,
'channels': 5,
'connect_retry': 4.0,
'heartbeat': (5, 2.0),
'query_timeout': 120.0,
'socket': ('remote_host_name', 6557)}}Regards,
Paul
On 08/09/15 08:57, Marcel Schulte wrote:
Hi Paul,
You have to activate Livestatus-Script tcp port (defaults
to 6557) at remote site and firewall access to that port.HTH,
MarcelPaul Bongers <Paul.Bongers@osudio.com
<mailto:Paul.Bongers@osudio.com>> schrieb am Di., 8. Sep. >>>>> 2015 08:50:Hi list,
I'm trying to set up distributed WATO on a new server.
However, I'm running into trouble as the remote site
is running on a machine behind a restricted firewall.What ports should be opened up to make this possible?
Both sites are running OMD 1.2.6p10.
I'm planning to use liveproxyd for accessing
livestatus data.--
Met vriendelijke groet / Best regards,
Paul Bongers
Application Engineer
_______________________________________________
checkmk-en mailing list
checkmk-en@lists.mathias-kettner.de
<mailto:checkmk-en@lists.mathias-kettner.de>
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-enWe’ll meet in Munich for the 2nd Check_MK Conference!
Book your place now and be part of it.
October 18th-20th, 2015
http://mathias-kettner.com/conference_______________________________________________
checkmk-en mailing list
checkmk-en@lists.mathias-kettner.de
<mailto:checkmk-en@lists.mathias-kettner.de>
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-enWe’ll meet in Munich for the 2nd Check_MK Conference!
Book your place now and be part of it.
October 18th-20th, 2015
http://mathias-kettner.com/conference