Basic info (versions,verbose outputs etc..)
CMK version:1.6.0
OS version: centos rhel fedora Amazon Linux 2
Error message: -none-
Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)
<<<check_mk>>>
Version: 1.6.0p11
AgentOS: linux
Hostname: ip-10-80-2-221.us-east-2.compute.internal
AgentDirectory: /etc/check_mk
DataDirectory: /var/lib/check_mk_agent
SpoolDirectory: /var/lib/check_mk_agent/spool
PluginsDirectory: /usr/lib/check_mk_agent/plugins
LocalDirectory: /usr/lib/check_mk_agent/local
# CUT SOME DUE TO CHAR LIMIT MAX
<<<nfsmounts>>>
<<<cifsmounts>>>
<<<mounts>>>
/dev/nvme0n1p1 / xfs rw,noatime,attr2,inode64,noquota 0 0
<<<ps>>>
dummy section -- refer to section ps_lnx
<<<ps_lnx>>>
[header] CGROUP USER VSZ RSS TIME ELAPSED PID COMMAND
- root 43952 5776 00:00:08 02:00:35 1 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
- root 0 0 00:00:00 02:00:35 2 [kthreadd]
- root 0 0 00:00:00 02:00:35 4 [kworker/0:0H]
- root 0 0 00:00:00 02:00:35 6 [mm_percpu_wq]
- root 0 0 00:00:02 02:00:35 7 [ksoftirqd/0]
- root 0 0 00:00:06 02:00:35 8 [rcu_sched]
- root 0 0 00:00:00 02:00:35 9 [rcu_bh]
- root 0 0 00:00:00 02:00:35 10 [migration/0]
- root 0 0 00:00:00 02:00:35 11 [watchdog/0]
- root 0 0 00:00:00 02:00:35 12 [cpuhp/0]
- root 0 0 00:00:00 02:00:35 13 [cpuhp/1]
- root 0 0 00:00:00 02:00:35 14 [watchdog/1]
- root 0 0 00:00:00 02:00:35 15 [migration/1]
- root 0 0 00:00:02 02:00:35 16 [ksoftirqd/1]
- root 0 0 00:00:00 02:00:35 18 [kworker/1:0H]
- root 0 0 00:00:00 02:00:35 19 [kdevtmpfs]
- root 0 0 00:00:00 02:00:35 20 [netns]
- root 0 0 00:00:00 02:00:35 129 [khungtaskd]
- root 0 0 00:00:00 02:00:35 179 [oom_reaper]
- root 0 0 00:00:00 02:00:35 180 [writeback]
- root 0 0 00:00:00 02:00:35 181 [kcompactd0]
- root 0 0 00:00:00 02:00:35 183 [ksmd]
- root 0 0 00:00:00 02:00:35 184 [khugepaged]
- root 0 0 00:00:00 02:00:35 185 [crypto]
- root 0 0 00:00:00 02:00:35 186 [kintegrityd]
- root 0 0 00:00:00 02:00:35 187 [kblockd]
- root 0 0 00:00:00 02:00:35 291 [md]
- root 0 0 00:00:00 02:00:35 298 [edac-poller]
- root 0 0 00:00:00 02:29 300 [kworker/u4:1]
- root 0 0 00:00:00 02:00:35 303 [watchdogd]
- root 0 0 00:00:00 02:00:35 430 [kauditd]
- root 0 0 00:00:00 02:00:35 436 [kswapd0]
1:name=systemd:/system.slice/omd.service tfprod 1055876 9120 00:00:00 01:59 517 /omd/sites/tfprod/bin/nagios -ud /omd/sites/tfprod/tmp/nagios/nagios.cfg
- root 0 0 00:00:00 02:00:34 568 [kthrotld]
- root 0 0 00:00:00 02:00:34 612 [kstrp]
- root 0 0 00:00:00 02:00:34 639 [ipv6_addrconf]
- root 0 0 00:00:00 01:39 673 [kworker/0:2]
- root 0 0 00:00:00 02:00:33 1235 [nvme-wq]
- root 0 0 00:00:00 02:00:33 1345 [xfsalloc]
- root 0 0 00:00:00 02:00:33 1346 [xfs_mru_cache]
- root 0 0 00:00:00 02:00:33 1348 [xfs-buf/nvme0n1]
- root 0 0 00:00:00 02:00:33 1349 [xfs-data/nvme0n]
- root 0 0 00:00:00 02:00:33 1350 [xfs-conv/nvme0n]
- root 0 0 00:00:00 02:00:33 1351 [xfs-cil/nvme0n1]
- root 0 0 00:00:00 02:00:33 1352 [xfs-reclaim/nvm]
- root 0 0 00:00:00 02:00:33 1353 [xfs-log/nvme0n1]
- root 0 0 00:00:00 02:00:33 1354 [xfs-eofblocks/n]
- root 0 0 00:00:02 02:00:33 1355 [xfsaild/nvme0n1]
- root 0 0 00:00:00 02:00:33 1356 [kworker/0:1H]
1:name=systemd:/system.slice/systemd-journald.service root 123036 59628 00:00:00 02:00:32 1420 /usr/lib/systemd/systemd-journald
- root 0 0 00:00:00 02:00:32 1430 [ena]
1:name=systemd:/system.slice/lvm2-lvmetad.service root 116824 2076 00:00:00 02:00:32 1436 /usr/sbin/lvmetad -f
1:name=systemd:/system.slice/systemd-udevd.service root 45512 3660 00:00:00 02:00:32 1439 /usr/lib/systemd/systemd-udevd
1:name=systemd:/user.slice/user-0.slice/session-c2.scope root 120060 3100 00:00:00 00:00 1528 /bin/bash /usr/bin/check_mk_agent --debug -vvn hostname
1:name=systemd:/user.slice/user-0.slice/session-c2.scope root 54016 3628 00:00:00 00:00 1551 ps ax -o cgroup:512,user:32,vsz,rss,cputime,etime,pid,command --columns 10000
- root 0 0 00:00:00 02:00:32 1826 [kworker/1:1H]
- root 0 0 00:00:00 02:00:31 1900 [rpciod]
- root 0 0 00:00:00 02:00:31 1901 [xprtiod]
1:name=systemd:/system.slice/auditd.service root 64332 2364 00:00:00 02:00:31 1904 /sbin/auditd
1:name=systemd:/system.slice/libstoragemgmt.service libstoragemgmt 12656 1864 00:00:00 02:00:31 1929 /usr/bin/lsmd -d
1:name=systemd:/system.slice/rpcbind.service rpc 73828 3580 00:00:00 02:00:31 1930 /sbin/rpcbind -w
1:name=systemd:/system.slice/dbus.service dbus 60548 4484 00:00:01 02:00:31 1934 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
1:name=systemd:/system.slice/chronyd.service chrony 122644 4028 00:00:00 02:00:31 1936 /usr/sbin/chronyd
1:name=systemd:/system.slice/rngd.service root 13144 924 00:00:00 02:00:31 1941 /sbin/rngd -f
1:name=systemd:/system.slice/irqbalance.service root 99916 1580 00:00:00 02:00:31 1943 /usr/sbin/irqbalance --foreground --hintpolicy=subset
1:name=systemd:/system.slice/systemd-logind.service root 28520 3016 00:00:00 02:00:31 1945 /usr/lib/systemd/systemd-logind
1:name=systemd:/system.slice/gssproxy.service root 208356 3540 00:00:00 02:00:31 1949 /usr/sbin/gssproxy -D
1:name=systemd:/system.slice/network.service root 105452 4272 00:00:00 02:00:29 2159 /sbin/dhclient -q -lf /var/lib/dhclient/dhclient--eth0.lease -pf /var/run/dhclient-eth0.pid -H ip-10-80-2-220 eth0
1:name=systemd:/system.slice/network.service root 105452 3920 00:00:00 02:00:28 2258 /sbin/dhclient -6 -nw -lf /var/lib/dhclient/dhclient6--eth0.lease -pf /var/run/dhclient6-eth0.pid eth0 -H ip-10-80-2-220
1:name=systemd:/system.slice/httpd.service root 394032 25076 00:00:00 02:00:28 2303 /usr/sbin/httpd -DFOREGROUND
1:name=systemd:/system.slice/xinetd.service root 27412 2596 00:00:00 02:00:28 2306 /usr/sbin/xinetd -stayalive -pidfile /var/run/xinetd.pid
1:name=systemd:/system.slice/pdagent.service pdagent 373136 21212 00:00:18 02:00:28 2312 /usr/bin/python /usr/share/pdagent/bin/pdagentd.py -f
1:name=systemd:/system.slice/httpd.service apache 292344 5644 00:00:00 02:00:27 2352 /usr/sbin/httpd -DFOREGROUND
1:name=systemd:/system.slice/httpd.service apache 632088 13336 00:00:01 02:00:27 2363 /usr/sbin/httpd -DFOREGROUND
1:name=systemd:/system.slice/postfix.service root 90416 4884 00:00:00 02:00:27 2444 /usr/libexec/postfix/master -w
1:name=systemd:/system.slice/postfix.service postfix 90580 6788 00:00:00 02:00:27 2446 qmgr -l -t unix -u
1:name=systemd:/system.slice/amazon-ssm-agent.service root 1024616 35104 00:00:04 02:00:26 2497 /usr/bin/amazon-ssm-agent
1:name=systemd:/system.slice/rsyslog.service root 741028 30824 00:00:00 02:00:26 2499 /usr/sbin/rsyslogd -n
1:name=systemd:/system.slice/sshd.service root 112928 7212 00:00:00 02:00:26 2504 /usr/sbin/sshd -D
1:name=systemd:/system.slice/crond.service root 135152 3380 00:00:00 02:00:26 2507 /usr/sbin/crond -n
1:name=systemd:/system.slice/atd.service root 27920 2240 00:00:00 02:00:26 2509 /usr/sbin/atd -f
1:name=systemd:/system.slice/system-serial\x2dgetty.slice/serial-getty@ttyS0.service root 120984 2248 00:00:00 02:00:26 2528 /sbin/agetty --keep-baud 115200,38400,9600 ttyS0 vt220
1:name=systemd:/system.slice/system-getty.slice/getty@tty1.service root 121336 1764 00:00:00 02:00:26 2529 /sbin/agetty --noclear tty1 linux
1:name=systemd:/system.slice/acpid.service root 4308 104 00:00:00 02:00:25 2566 /usr/sbin/acpid
1:name=systemd:/system.slice/omd.service tfprod 228548 28696 00:00:04 02:00:24 2581 python /omd/sites/tfprod/bin/mkeventd
1:name=systemd:/system.slice/omd.service tfprod 710808 19252 00:00:21 02:00:24 2591 /omd/sites/tfprod/bin/rrdcached -t 4 -w 3600 -z 1800 -f 7200 -s tfprod -m 660 -l unix:/omd/sites/tfprod/tmp/run/rrdcached.sock -p /omd/sites/tfprod/tmp/rrdcached.pid -j /omd/sites/tfprod/var/rrdcached -o /omd/sites/tfprod/var/log/rrdcached.log
1:name=systemd:/system.slice/omd.service tfprod 156072 1852 00:00:00 02:00:24 2610 /omd/sites/tfprod/bin/npcd -d -f /omd/sites/tfprod/etc/pnp4nagios/npcd.cfg
1:name=systemd:/system.slice/omd.service tfprod 123328 6528 00:00:00 02:00:23 2674 /usr/sbin/httpd -f /omd/sites/tfprod/etc/apache/apache.conf
1:name=systemd:/system.slice/omd.service tfprod 123304 3668 00:00:00 02:00:23 2678 /usr/sbin/httpd -f /omd/sites/tfprod/etc/apache/apache.conf
1:name=systemd:/system.slice/omd.service tfprod 452616 174572 00:00:10 02:00:23 2680 /usr/sbin/httpd -f /omd/sites/tfprod/etc/apache/apache.conf
1:name=systemd:/system.slice/omd.service tfprod 379120 5648 00:00:00 02:00:23 2683 stunnel /omd/sites/tfprod/etc/stunnel/server.conf
1:name=systemd:/system.slice/omd.service tfprod 27412 2504 00:00:00 02:00:23 2691 /omd/sites/tfprod/var/tmp/xinetd -pidfile /omd/sites/tfprod/tmp/run/xinetd.pid -filelog /omd/sites/tfprod/var/log/xinetd.log -f /omd/sites/tfprod/etc/xinetd.conf
1:name=systemd:/system.slice/omd.service tfprod 452464 174232 00:00:06 01:59:53 2984 /usr/sbin/httpd -f /omd/sites/tfprod/etc/apache/apache.conf
1:name=systemd:/system.slice/omd.service tfprod 438016 159912 00:00:07 01:46:40 9395 /usr/sbin/httpd -f /omd/sites/tfprod/etc/apache/apache.conf
1:name=systemd:/system.slice/omd.service tfprod 434580 158724 00:00:10 01:45:57 9870 /usr/sbin/httpd -f /omd/sites/tfprod/etc/apache/apache.conf
1:name=systemd:/system.slice/omd.service tfprod 457224 179200 00:00:11 01:45:39 10075 /usr/sbin/httpd -f /omd/sites/tfprod/etc/apache/apache.conf
- root 0 0 00:00:00 38:41 13639 [kworker/u4:2]
1:name=systemd:/system.slice/omd.service tfprod 297204 23116 00:00:00 01:36:53 14822 /usr/bin/php-cgi -d session.save_handler=files -d session.save_path=/omd/sites/tfprod/tmp/php/session -d upload_tmp_dir=/omd/sites/tfprod/tmp/php/upload -d soap.wsdl_cache_dir=/omd/sites/tfprod/tmp/php/wsdl-cache -d safe_mode=Off -d mysql.default_socket=/omd/sites/tfprod/tmp/run/mysqld/mysqld.sock
1:name=systemd:/system.slice/httpd.service apache 435412 13376 00:00:01 01:36:51 14903 /usr/sbin/httpd -DFOREGROUND
1:name=systemd:/system.slice/httpd.service apache 435412 13376 00:00:01 01:36:51 14910 /usr/sbin/httpd -DFOREGROUND
1:name=systemd:/system.slice/httpd.service apache 435412 13376 00:00:01 01:36:51 14912 /usr/sbin/httpd -DFOREGROUND
1:name=systemd:/system.slice/httpd.service apache 435412 13376 00:00:01 01:36:50 14999 /usr/sbin/httpd -DFOREGROUND
1:name=systemd:/system.slice/httpd.service apache 435412 13376 00:00:01 01:36:50 15007 /usr/sbin/httpd -DFOREGROUND
1:name=systemd:/system.slice/httpd.service apache 435412 13376 00:00:01 01:36:50 15013 /usr/sbin/httpd -DFOREGROUND
1:name=systemd:/system.slice/httpd.service apache 435412 13376 00:00:01 01:36:50 15019 /usr/sbin/httpd -DFOREGROUND
1:name=systemd:/system.slice/httpd.service apache 435412 13448 00:00:01 01:35:28 15809 /usr/sbin/httpd -DFOREGROUND
1:name=systemd:/system.slice/httpd.service apache 435408 13324 00:00:01 01:35:28 15810 /usr/sbin/httpd -DFOREGROUND
- root 0 0 00:00:00 28:41 18528 [kworker/0:1]
1:name=systemd:/system.slice/postfix.service postfix 90504 6948 00:00:00 19:23 23530 pickup -l -t unix -u
- root 0 0 00:00:00 17:38 24327 [kworker/1:0]
- root 0 0 00:00:00 16:42 24896 [kworker/1:1]
- root 0 0 00:00:01 01:16:26 26020 [kworker/u4:0]
- root 0 0 00:00:00 10:41 28253 [kworker/1:3]
- root 0 0 00:00:00 08:41 29319 [kworker/0:0]
- root 0 0 00:00:00 07:41 29827 [kworker/0:4]
- root 0 0 00:00:00 04:41 31536 [kworker/1:2]
1:name=systemd:/system.slice/amazon-ssm-agent.service root 726592 27336 00:00:00 02:37 32564 /usr/bin/ssm-session-worker jacksmith-k3dquxuzoytlgkyqvyryucl6uq i-073b45b9400d658aa
1:name=systemd:/system.slice/amazon-ssm-agent.service delkind 124264 3508 00:00:00 02:36 32577 sh
1:name=systemd:/system.slice/amazon-ssm-agent.service root 216940 6332 00:00:00 02:33 32595 sudo su -
1:name=systemd:/user.slice/user-0.slice/session-c2.scope root 192608 4036 00:00:00 02:33 32596 su -
1:name=systemd:/user.slice/user-0.slice/session-c2.scope root 125056 4308 00:00:00 02:33 32597 -bash
<<<mem>>>
MemTotal: 3990584 kB
MemFree: 1185736 kB
MemAvailable: 2520568 kB
Buffers: 2088 kB
Cached: 1327440 kB
SwapCached: 0 kB
Active: 1132984 kB
Inactive: 1113284 kB
Active(anon): 919492 kB
Inactive(anon): 14524 kB
Active(file): 213492 kB
Inactive(file): 1098760 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 6291452 kB
SwapFree: 6291452 kB
Dirty: 51152 kB
Writeback: 0 kB
AnonPages: 916752 kB
Mapped: 152052 kB
Shmem: 17880 kB
Slab: 345248 kB
SReclaimable: 310608 kB
SUnreclaim: 34640 kB
KernelStack: 3984 kB
PageTables: 19960 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 8286744 kB
Committed_AS: 2213472 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 67560 kB
DirectMap2M: 3028992 kB
DirectMap1G: 1048576 kB
<<<cpu>>>
0.51 0.35 0.45 1/249 1556 2
30895
<<<uptime>>>
7235.96 11575.68
<<<lnx_if>>>
[start_iplink]
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
link/ether 06:d4:87:42:e0:e4 brd ff:ff:ff:ff:ff:ff
inet 10.80.2.220/24 brd 10.80.2.255 scope global dynamic eth0
valid_lft 2485sec preferred_lft 2485sec
inet6 fe80::4d4:87ff:fe42:e0e4/64 scope link
valid_lft forever preferred_lft forever
[end_iplink]
<<<lnx_if:sep(58)>>>
eth0: 629108496 269823 0 0 0 0 0 0 25059959 252669 0 0 0 0 0 0
lo: 33140748 37216 0 0 0 0 0 0 33140748 37216 0 0 0 0 0 0
[lo]
Link detected: yes
Address: 00:00:00:00:00:00
[eth0]
Link detected: yes
Address: 06:d4:87:42:e0:e4
<<<tcp_conn_stats>>>
01 3
0A 10
06 15
<<<diskstat>>>
1746722203
259 0 nvme0n1 62316 11 1693526 40392 134686 34226 4281400 127556 0 30508 115128
[dmsetup_info]
No devices found
<<<kernel>>>
1746722203
nr_free_pages 296093
nr_zone_inactive_anon 3631
nr_zone_active_anon 229863
nr_zone_inactive_file 274701
nr_zone_active_file 53373
nr_zone_unevictable 0
nr_zone_write_pending 12793
nr_mlock 0
nr_page_table_pages 4995
nr_kernel_stack 4000
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 71978296
numa_miss 0
numa_foreign 0
numa_interleave 24796
numa_local 71978296
numa_other 0
nr_inactive_anon 3631
nr_active_anon 229863
nr_inactive_file 274701
nr_active_file 53373
nr_unevictable 0
nr_slab_reclaimable 77652
nr_slab_unreclaimable 8676
nr_isolated_anon 0
nr_isolated_file 0
workingset_refault 0
workingset_activate 0
workingset_nodereclaim 0
nr_anon_pages 229190
nr_mapped 38013
nr_file_pages 332387
nr_dirty 12793
nr_writeback 0
nr_writeback_temp 0
nr_shmem 4470
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_anon_transparent_hugepages 0
nr_unstable 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_dirtied 402165
nr_written 358882
nr_dirty_threshold 118785
nr_dirty_background_threshold 59320
pgpgin 846763
pgpgout 2140700
pswpin 0
pswpout 0
pgalloc_dma 0
pgalloc_dma32 68286457
pgalloc_normal 3891853
pgalloc_movable 0
allocstall_dma 0
allocstall_dma32 0
allocstall_normal 0
allocstall_movable 0
pgskip_dma 0
pgskip_dma32 0
pgskip_normal 0
pgskip_movable 0
pgfree 72475058
pgactivate 115183
pgdeactivate 0
pglazyfree 612
pgfault 103753281
pgmajfault 34682
pglazyfreed 0
pgrefill 0
pgsteal_kswapd 0
pgsteal_direct 0
pgscan_kswapd 0
pgscan_direct 0
pgscan_direct_throttle 0
zone_reclaim_failed 0
pginodesteal 0
slabs_scanned 0
kswapd_inodesteal 0
kswapd_low_wmark_hit_quickly 0
kswapd_high_wmark_hit_quickly 0
pageoutrun 0
pgrotated 0
drop_pagecache 0
drop_slab 0
oom_kill 0
numa_pte_updates 0
numa_huge_pte_updates 0
numa_hint_faults 0
numa_hint_faults_local 0
numa_pages_migrated 0
pgmigrate_success 0
pgmigrate_fail 0
compact_migrate_scanned 0
compact_free_scanned 0
compact_isolated 0
compact_stall 0
compact_fail 0
compact_success 0
compact_daemon_wake 0
compact_daemon_migrate_scanned 0
compact_daemon_free_scanned 0
htlb_buddy_alloc_success 0
htlb_buddy_alloc_fail 0
unevictable_pgs_culled 0
unevictable_pgs_scanned 0
unevictable_pgs_rescued 0
unevictable_pgs_mlocked 0
unevictable_pgs_munlocked 0
unevictable_pgs_cleared 0
unevictable_pgs_stranded 0
thp_fault_alloc 0
thp_fault_fallback 0
thp_collapse_alloc 0
thp_collapse_alloc_failed 0
thp_file_alloc 0
thp_file_mapped 0
thp_split_page 0
thp_split_page_failed 0
thp_deferred_split_page 0
thp_split_pmd 0
thp_split_pud 0
thp_zero_page_alloc 0
thp_zero_page_alloc_failed 0
thp_swpout 0
thp_swpout_fallback 0
swap_ra 0
swap_ra_hit 0
cpu 224635 0 43908 1154462 2896 0 595 18786 0 0
cpu0 111248 0 21856 578529 1448 0 290 9270 0 0
cpu1 113386 0 22052 575933 1448 0 304 9516 0 0
intr 3755389 50 11 0 0 254 0 0 0 0 0 0 0 88 0 0 0 0 0 0 0 0 0 0 0 71469 73222 7245 230324 233135 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 5995352
btime 1746714967
processes 66317
procs_running 1
procs_blocked 0
softirq 5781137 0 1232220 1431 488457 0 0 1054 646386 0 3411589
<<<md>>>
Personalities :
unused devices: <none>
<<<vbox_guest>>>
<<<chrony:cached(1746722171,30)>>>
Reference ID : A9FEA97B (169.254.169.123)
Stratum : 4
Ref time (UTC) : Thu May 08 16:35:58 2025
System time : 0.000000597 seconds fast of NTP time
Last offset : +0.000000177 seconds
RMS offset : 0.000000734 seconds
Frequency : 10.254 ppm slow
Residual freq : +0.000 ppm
Skew : 0.019 ppm
Root delay : 0.000278320 seconds
Root dispersion : 0.000070091 seconds
Update interval : 16.3 seconds
Leap status : Normal
<<<postfix_mailq>>>
[[[]]]
QUEUE_deferred 0 0
QUEUE_active 0 0
<<<postfix_mailq_status:sep(58)>>>
postfix/:the Postfix mail system is running:PID:2444
<<<labels:sep(0)>>>
{"cmk/check_mk_server": "yes"}
<<<omd_status:cached(1746722111,60)>>>
[tfprod]
mkeventd 0
rrdcached 0
npcd 0
nagios 0
apache 0
stunnel 0
xinetd 0
crontab 0
OVERALL 0
<<<mknotifyd:sep(0)>>>
1746722203
<<<omd_apache:sep(124)>>>
[tfprod]
<<<livestatus_status:sep(59)>>>
[tfprod]
accept_passive_host_checks;accept_passive_service_checks;average_latency_cmk;average_latency_generic;average_latency_real_time;cached_log_messages;check_external_commands;check_host_freshness;check_service_freshness;connections;connections_rate;core_pid;enable_event_handlers;enable_flap_detection;enable_notifications;execute_host_checks;execute_service_checks;external_command_buffer_max;external_command_buffer_slots;external_command_buffer_usage;external_commands;external_commands_rate;forks;forks_rate;has_event_handlers;helper_usage_cmk;helper_usage_generic;helper_usage_real_time;host_checks;host_checks_rate;interval_length;last_command_check;last_log_rotation;livechecks;livechecks_rate;livestatus_active_connections;livestatus_overflows;livestatus_overflows_rate;livestatus_queued_connections;livestatus_threads;livestatus_usage;livestatus_version;log_messages;log_messages_rate;mk_inventory_last;nagios_pid;neb_callbacks;neb_callbacks_rate;num_hosts;num_queued_alerts;num_queued_notifications;num_services;obsess_over_hosts;obsess_over_services;process_performance_data;program_start;program_version;requests;requests_rate;service_checks;service_checks_rate
1;1;0;0.0999218;0;0;1;1;1;4;0.0131807;517;1;1;1;1;1;0;32768;0;0;0;183;1.94337;0;0;0;0;88;0.836341;60;1746722203;0;0;0;1;0;0;0;20;0;1.6.0p11;900;0.32284;0;517;2490;12.4492;44;0;0;845;0;0;1;1746722083;3.5.1;5;0.0141203;834;7.84923
<<<livestatus_ssl_certs:sep(124)>>>
[tfprod]
/omd/sites/tfprod/etc/ssl/ca.pem|33091841259
/omd/sites/tfprod/etc/ssl/sites/tfprod.pem|33091841259
<<<mkeventd_status:sep(0)>>>
["tfprod"]
[["status_config_load_time", "status_num_open_events", "status_virtual_memory_size", "status_messages", "status_message_rate", "status_average_message_rate", "status_rule_tries", "status_rule_trie_rate", "status_average_rule_trie_rate", "status_rule_hits", "status_rule_hit_rate", "status_average_rule_hit_rate", "status_drops", "status_drop_rate", "status_average_drop_rate", "status_overflows", "status_overflow_rate", "status_average_overflow_rate", "status_events", "status_event_rate", "status_average_event_rate", "status_connects", "status_connect_rate", "status_average_connect_rate", "status_average_processing_time", "status_average_request_time", "status_average_sync_time", "status_replication_slavemode", "status_replication_last_sync", "status_replication_success", "status_event_limit_host", "status_event_limit_rule", "status_event_limit_overall", "status_event_limit_active_hosts", "status_event_limit_active_rules", "status_event_limit_active_overall"], [1746714979.162112, 0, 234033152, 0, 0.0, 0.0, 0, 0.0, 0.0, 0, 0.0, 0.0, 0, 0.0, 0.0, 0, 0.0, 0.0, 0, 0.0, 0.0, 129, 0.0, 0.016450142543335183, 0.0, 0.0005386570432657074, 0.0, "master", 0.0, false, 1000, 1000, 10000, [], [], false]]
<<<mkbackup>>>
[[[site:tfprod:nightly]]]
{
"bytes_per_second": 1103765.0956048495,
"finished": 1746678654.926181,
"next_schedule": 1746763200.0,
"output": "2025-05-08 04:00:02 --- Starting backup (Check_MK-ip+10+80+2+220.us+east+2.compute.internal-tfprod-nightly to BackupTarget) ---\n2025-05-08 04:30:52 Verifying backup consistency\n2025-05-08 04:30:54 Cleaning up previously completed backup\n2025-05-08 04:30:54 --- Backup completed (Duration: 0:30:52, Size: 891.19 MB, IO: 1.05 MB/s) ---\n",
"pid": 23618,
"size": 934477223,
"started": 1746676802.072707,
"state": "finished",
"success": true
}
<<<job>>>
<<<local>>>
Issue
We have OMD CheckMK self-deployed on the cloud using multiple OMD. It’s our primary monitoring system. We have a few sites. We have notification rules that trigger PagerDuty alert.
Without any change, a few weeks ago the PagerDuty alert has stopped working only in one site. We have a global notification rule that is applied for all sites, and has always worked and hasn’t been modified. It’s the same PagerDuty Token that is applied no matter the OMD region. But some reason, when it comes from that one specific problematic region, even though the log looks clean and even says it got a response from PagerDuty like so:
2025-05-08 16:09:24 ----------------------------------------------------------------------
2025-05-08 16:09:24 Got raw notification (ecs-instance-REDACTED-i-REDACTED.us-east-2;Check_MK) context with 52 variables
2025-05-08 16:09:24 Global rule 'High urgency notifications for Prod envs'...
2025-05-08 16:09:24 -> matches!
.
.
.
2025-05-08 16:09:24 Executing 1 notifications:
2025-05-08 16:09:24 * notifying pagerduty via pagerduty-agent, parameters: c09ad8---REDACTED----322812311, bulk: no
2025-05-08 16:09:24 executing /omd/sites/redacted/local/share/check_mk/notifications/pagerduty-agent
2025-05-08 16:09:24 Output: Event processed. Incident Key: ('event_source=service;host_name=ecs-instance-docker--redacted--i-05bbf1e8f92533703.us-east-2;service_desc=Check_MK', [])
The response looks exactly the same in other regions, but only in that specific region it doesn’t actually create the alert in PagerDuty.
I’ve tried everything: restart the server, new notification group, create new PD service, I even tried sending an API request from that server with cURL (and it worked, but not from CheckMK).
I’d really appreciate help on the matter. Thanks ahead