[Check_mk (english)] 1.4.0p34 python check_mk.py --keepalive process is coredumping.

Hi

A couple of months ago we upgraded our master running on at the time SuSE 11 SP4 to SuSE 12 SP3, and from Check_mk 1.2.8p27 to 1.4.0p34.

After that we have seen that the server is using a lot more CPU than before, and that the check_mk.py –keepalive process is core dumping, without no further information to find.

Like this:

dkamonp-ns01:/etc/sysconfig # coredumpctl dump

PID: 43202 (python)

UID: 109 (nagios01)

GID: 1000 (nagios01)

Signal: 11 (SEGV)

Timestamp: Tue 2018-10-16 14:04:30 CEST (1s ago)

Command Line: python /omd/sites/nagios01/share/check_mk/modules/check_mk.py --keepalive

Executable: /opt/omd/versions/1.4.0p34.cee/bin/python2.7

Control Group: /system.slice/sshd.service

Unit: sshd.service

Slice: system.slice

Boot ID: 699e6f42aa774226a7b9fc36b42b244e

Machine ID: 419d15c7672e90948173fbd353576920

Hostname: dkamonp-ns01

Message: Process 43202 (python) of user 109 dumped core.

Refusing to dump core to tty.

These happens around every 4 seconds in average.

These are the lines from /var/log/messages:

2018-10-16T14:24:36.346230+02:00 dkamonp-ns01 kernel: [11744.456266] python[47529]: segfault at ffffffffffffffff ip 00007ffb79970339 sp 00007fffe625f6b0 error 5 in libpython2.7.so.1.0[7ffb798a4000+21b000]

2018-10-16T14:24:36.360628+02:00 dkamonp-ns01 systemd-coredump[48417]: Core Dumping has been disabled for process 47529 (python).

2018-10-16T14:24:36.361051+02:00 dkamonp-ns01 systemd-coredump[48417]: Process 47529 (python) of user 109 dumped core.

2018-10-16T14:24:37.170229+02:00 dkamonp-ns01 kernel: [11745.277670] traps: python[47997] general protection ip:7f8bac1d3339 sp:7ffccb870930 error:0 in libpython2.7.so.1.0[7f8bac107000+21b000]traps:

2018-10-16T14:24:37.184636+02:00 dkamonp-ns01 systemd-coredump[48419]: Core Dumping has been disabled for process 47997 (python).

2018-10-16T14:24:37.185053+02:00 dkamonp-ns01 systemd-coredump[48419]: Process 47997 (python) of user 109 dumped core.

We have been going through all python environment references and correcting them with no result.

The site I actually still running, but very slow, and sometimes with a lot of timeouts and errors where it cannot connect to the services.

Has anyone seen anything like this ?

We have also tried with a fresh server, a clean install of 1.4.0p34, and a new site. And that works.

As soon as we import a backup from the original site, and restores it, the dumps comes back.

/Sune Folkmann