CRIT alarm from postfix instance

CMK version: 2.1.0p9
OS version: Ubuntu 20.04 (running CMK in docker container)

Error message: I constantly get a CRITICAL alert with Service Description “Postfix status” and Summary/Details “Status: PID file exists but instance is not running!”. Postfix is running and working in my instance. I figured out that the alert is triggered by this line:

The issue is that readlink -- "/proc/${postfix_pid}/exe" returns an empty string in my case, so then the alert is triggered.

What can I do to successfully monitor my postfix instance?

The question here is why is the pid number from /var/spool/postfix/pid/master.pid not existing/running?

Thank you very much, @andreas-doehler, for your reply!

I agree. I don’t have too much experience with postfix. But you can run the below simple commands to reproduce the issue with the official check-mk-raw docker image. Maybe this is an issue with the image?

docker container run --rm -d -p 8080:5000 -p 8000:8000 --tmpfs /opt/omd/sites/cmk/tmp:uid=1000,gid=1000 -v /omd/sites --name monitoring -v /etc/localtime:/etc/localtime:ro -e MAIL_RELAY_HOST='mailrelay.mydomain.com' --hostname monitoring.mydomain.com checkmk/check-mk-raw:2.1.0-latest
docker exec -it monitoring bash
postfix_pid=$(sed 's/ //g' </var/spool/postfix/pid/master.pid)
readlink -- "/proc/${postfix_pid}/exe"

The issue is that the last line returns an empty string. This is why checkmk reports CRIT status for the postfix service. But the postfix service is indeed running (was started in the entry point script because the MAIL_RELAY_HOST env var was set). You can check this with service postfix status.

Any idea?

@tog I am having the same issue. I have PID file in right location, when i test the bash script, I get postfix is up in running. but the problem dashboard still list postfix as not running. have you made any progress on this?

@JeremyC No, unfortunately not. As you can see from my above post, the issue occurs using the official docker image. I don’t have too much experience with checkmk and postfix, but I assume an issue with the postfix installation in the official docker image.

I could easily run one or multiple commands in the post-create hook of the entry point script. But somebody would have to point out a solution …

Does anybody have any idea about this? Can anybody try to reproduce this? (Actually I suspect this to be an issue with the official docker image.)

I’m having the same issue on a native setup on Ubuntu 18.04, so it may not be Docker related.

@mslone That is interesting to hear. Please let me/us know if you find a solution. (The issue is still present in my instance.)

@mslone Did you find a solution for this?

I have the same problem with the official Docker image.
Has anyone found a solution yet?

This looks like a bug which is fixed with Enhance detection of postfix installation . Thus updating to at least v2.1.0p16 should fix the problem.

1 Like

Hi,
New user here on the dockerized free trial version.
I am also having hard time with this Status: PID file exists but instance is not running! issue.
My agent is on Version: 2.1.0p20, OS: linux, Agent plugins: 0, Local checks: 0.
I have a postfix running in an other docker container and I am able to send emails through that.
So, yeah something is up.
Here is the relevant docker log:

### PREPARE POSTFIX (Hostname: 739be51280ca, Relay host: 192.168.1.50:2525)
### STARTING MAIL SERVICES
739be51280ca syslogd: /dev/xconsole: No such file or directory
postfix/postfix-script: starting the Postfix mail system
...done.
### STARTING XINETD

Just an update, the email alert is working as expected. Still there is this error…

Hi @dennis.ehmer

Thank you very much for your response and your hint. Looked really promising … I now updated to v2.1.0p20, but the alert is still there. Do I have to do something in order to kind of “force” the check to reset?

Also @Imperial7693 reported these days that he also experiences the issue with 2.1.0p20 …

Also having an issue with “STARTING MAIL SERVICES”:

...
checkmk  | ### STARTING MAIL SERVICES
checkmk  | syslogd: timed out waiting for child
checkmk exited with code 0
...

and that’s on loop…
Running checkmk in Docker container, along side ‘smtp relay’ container - GitHub - namshi/docker-smtp: SMTP docker container

This is exactly my issue: to avoid a time-out starting CMK in a docker I need to suppress the Relay env variable which then means postfix is not started.

I have bodged this with a custom up -d script which runs a command in the running CMK to start postfix but clearly there is still an issue in the raw docker image when Relay is set on.

Has anyone figured this one out? No matter which image tag I use it’s the same:

checkmk  | ### STARTING MAIL SERVICES
checkmk  | syslogd: timed out waiting for child

Host is Arch Linux if it matters somehow…

Hi, it’s me again.

Checked this post again since the error still persists.

@dennis.ehmer: As stated already above, I meanwhile use v2.1.0p20 of the check_mk_raw container and agent but the issue still persists. Do I maybe have to do something in order to kind of “force” the check to reset?

@Imperial7693, @predmijat: My issue is maybe not the same as yours. Mail services are starting up seamlessly in my case and also e-mail notifications work. service postfix status returns postfix is running. Even though this “Postfix status” service reports critical state. As stated above, the issue is that readlink -- "/proc/${postfix_pid}/exe" returns an empty string. Also tried cd /proc/${postfix_pid}/exe and cat /proc/${postfix_pid}/exe but both return “permission denied”.

Anybody any other idea …?

Just to be sure: The agent was also updated to at least v2.1.0p20, not only the site, right?

yes, both the site and the agent run on version 2.1.0p26