CRIT alarm from postfix instance

tog · August 23, 2022, 6:17am

CMK version: 2.1.0p9
OS version: Ubuntu 20.04 (running CMK in docker container)

Error message: I constantly get a CRITICAL alert with Service Description “Postfix status” and Summary/Details “Status: PID file exists but instance is not running!”. Postfix is running and working in my instance. I figured out that the alert is triggered by this line:

github.com

tribe29/checkmk/blob/master/agents/check_mk_agent.linux#L1002


      
          # Postfix status monitoring. Can handle multiple instances.
          if inpath postfix; then
              echo "<<<postfix_mailq_status:sep(58)>>>"
              for i in /var/spool/postfix*/; do
                  if [ -e "${i}/pid/master.pid" ]; then
                      if [ -r "${i}/pid/master.pid" ]; then
                          postfix_pid=$(sed 's/ //g' <"${i}/pid/master.pid") # handle possible spaces in output
                          if readlink -- "/proc/${postfix_pid}/exe" | grep -q ".*postfix/\(s\?bin/\)\?master.*"; then
                              echo "${i}:the Postfix mail system is running:PID:${postfix_pid}" | sed 's/\/var\/spool\///g'
                          else
                              echo "${i}:PID file exists but instance is not running!" | sed 's/\/var\/spool\///g'
                          fi
                      else
                          echo "${i}:PID file exists but is not readable"
                      fi
                  else
                      echo "${i}:the Postfix mail system is not running" | sed 's/\/var\/spool\///g'
                  fi
              done
          fi

The issue is that readlink -- "/proc/${postfix_pid}/exe" returns an empty string in my case, so then the alert is triggered.

What can I do to successfully monitor my postfix instance?

andreas-doehler · August 23, 2022, 8:10pm

The question here is why is the pid number from /var/spool/postfix/pid/master.pid not existing/running?

tog · August 24, 2022, 7:23am

Thank you very much, @andreas-doehler, for your reply!

I agree. I don’t have too much experience with postfix. But you can run the below simple commands to reproduce the issue with the official check-mk-raw docker image. Maybe this is an issue with the image?

docker container run --rm -d -p 8080:5000 -p 8000:8000 --tmpfs /opt/omd/sites/cmk/tmp:uid=1000,gid=1000 -v /omd/sites --name monitoring -v /etc/localtime:/etc/localtime:ro -e MAIL_RELAY_HOST='mailrelay.mydomain.com' --hostname monitoring.mydomain.com checkmk/check-mk-raw:2.1.0-latest
docker exec -it monitoring bash
postfix_pid=$(sed 's/ //g' </var/spool/postfix/pid/master.pid)
readlink -- "/proc/${postfix_pid}/exe"

The issue is that the last line returns an empty string. This is why checkmk reports CRIT status for the postfix service. But the postfix service is indeed running (was started in the entry point script because the MAIL_RELAY_HOST env var was set). You can check this with service postfix status.

Any idea?

JeremyC · August 25, 2022, 8:39pm

@tog I am having the same issue. I have PID file in right location, when i test the bash script, I get postfix is up in running. but the problem dashboard still list postfix as not running. have you made any progress on this?

tog · August 26, 2022, 4:55am

@JeremyC No, unfortunately not. As you can see from my above post, the issue occurs using the official docker image. I don’t have too much experience with checkmk and postfix, but I assume an issue with the postfix installation in the official docker image.

I could easily run one or multiple commands in the post-create hook of the entry point script. But somebody would have to point out a solution …

tog · September 2, 2022, 5:22pm

Does anybody have any idea about this? Can anybody try to reproduce this? (Actually I suspect this to be an issue with the official docker image.)

mslone · October 27, 2022, 6:00pm

I’m having the same issue on a native setup on Ubuntu 18.04, so it may not be Docker related.

tog · October 28, 2022, 5:31am

@mslone That is interesting to hear. Please let me/us know if you find a solution. (The issue is still present in my instance.)

tog · January 6, 2023, 2:16pm

@mslone Did you find a solution for this?

majestro84 · January 30, 2023, 9:02am

I have the same problem with the official Docker image.
Has anyone found a solution yet?

dennis.ehmer · February 6, 2023, 2:06pm

This looks like a bug which is fixed with Enhance detection of postfix installation . Thus updating to at least v2.1.0p16 should fix the problem.

Imperial7693 · February 7, 2023, 7:44am

Hi,
New user here on the dockerized free trial version.
I am also having hard time with this Status: PID file exists but instance is not running! issue.
My agent is on Version: 2.1.0p20, OS: linux, Agent plugins: 0, Local checks: 0.
I have a postfix running in an other docker container and I am able to send emails through that.
So, yeah something is up.
Here is the relevant docker log:

### PREPARE POSTFIX (Hostname: 739be51280ca, Relay host: 192.168.1.50:2525)
### STARTING MAIL SERVICES
739be51280ca syslogd: /dev/xconsole: No such file or directory
postfix/postfix-script: starting the Postfix mail system
...done.
### STARTING XINETD

Imperial7693 · February 7, 2023, 1:26pm

Just an update, the email alert is working as expected. Still there is this error…

tog · February 8, 2023, 6:26am

Hi @dennis.ehmer

Thank you very much for your response and your hint. Looked really promising … I now updated to v2.1.0p20, but the alert is still there. Do I have to do something in order to kind of “force” the check to reset?

Also @Imperial7693 reported these days that he also experiences the issue with 2.1.0p20 …

predmijat · February 17, 2023, 12:50pm

Also having an issue with “STARTING MAIL SERVICES”:

...
checkmk  | ### STARTING MAIL SERVICES
checkmk  | syslogd: timed out waiting for child
checkmk exited with code 0
...

and that’s on loop…
Running checkmk in Docker container, along side ‘smtp relay’ container - GitHub - namshi/docker-smtp: SMTP docker container

b3lt3r · February 20, 2023, 10:19am

This is exactly my issue: to avoid a time-out starting CMK in a docker I need to suppress the Relay env variable which then means postfix is not started.

I have bodged this with a custom up -d script which runs a command in the running CMK to start postfix but clearly there is still an issue in the raw docker image when Relay is set on.

predmijat · April 17, 2023, 4:59am

Has anyone figured this one out? No matter which image tag I use it’s the same:

checkmk  | ### STARTING MAIL SERVICES
checkmk  | syslogd: timed out waiting for child

Host is Arch Linux if it matters somehow…

tog · April 19, 2023, 1:03pm

Hi, it’s me again.

Checked this post again since the error still persists.

@dennis.ehmer: As stated already above, I meanwhile use v2.1.0p20 of the check_mk_raw container and agent but the issue still persists. Do I maybe have to do something in order to kind of “force” the check to reset?

@Imperial7693, @predmijat: My issue is maybe not the same as yours. Mail services are starting up seamlessly in my case and also e-mail notifications work. service postfix status returns postfix is running. Even though this “Postfix status” service reports critical state. As stated above, the issue is that readlink -- "/proc/${postfix_pid}/exe" returns an empty string. Also tried cd /proc/${postfix_pid}/exe and cat /proc/${postfix_pid}/exe but both return “permission denied”.

Anybody any other idea …?

dennis.ehmer · April 21, 2023, 2:41pm

Just to be sure: The agent was also updated to at least v2.1.0p20, not only the site, right?

tog · April 21, 2023, 2:54pm

yes, both the site and the agent run on version 2.1.0p26