Hello,
Issue
The check heartbeat_crm.resources (from version 1.2.6 to 1.4.0) will not be activated in check_mk when the status is okay. As a result, an error in the cluster does not result in a notification, but only that there are ‘unchecked
services’.
Example
pcs status
3 nodes configured
2 resources configured
Online: [ ]
Full list of resources:
SQUID-VIP (ocf:IPaddr2): Started
FPX-Squid (ocf:Squid): Started
Failed Actions:
- FPX-Squid_monitor_10000 on prdloyfpxl01.keylanehosting.local ‘unknown error’ (1): call=17, status=complete, exitreason=’’,
last-rc-change=‘Mon Nov 12 18:10:46 2018’, queued=0ms, exec=0ms
Check_mk output
···
<<<heartbeat_crm>>>
Stack: cman
Current DC: <host2> (version 1.1.18-3.el6-bfe4e80420) - partition with quorum
Last updated: Wed Nov 14 16:23:44 2018
Last change: Fri Oct 5 19:20:53 2018 by root via crm_attribute on <host2>
2 nodes configured
2 resources configured
Online: [ ]
Full list of resources:
FPX-VIP (ocf:IPaddr2): Started
FPX-Squid (ocf::heartbeat:Squid): Started <host1>
Failed Actions:
* FPX-Squid_monitor_10000 on prekeyfpxl01.mgmt.keylanehosting.local 'not running' (7): call=13, status=complete, exitreason='',
_ last-rc-change=‘Wed Nov 14 16:22:35 2018’, queued=0ms, exec=0ms
Improvement
Pre-create check for when failed action occur, output will be parsed in this check.
To reproduce
Use following command in pcs cluster. Our test cluster run’s squid.
Command > sudo killall squid
Re-run check_mk_agent output
Stefan Baas
Infrastructure Specialist
Keylane
T +31 88 404 59 26
M +31 6 465 216 07