[Check_mk (english)] Improvement for heartbeat_crm.resources

keylane_sbaas · December 28, 2018, 5:20pm

Hello,

Issue
The check heartbeat_crm.resources (from version 1.2.6 to 1.4.0) will not be activated in check_mk when the status is okay. As a result, an error in the cluster does not result in a notification, but only that there are ‘unchecked
services’.

Example

pcs status

3 nodes configured

2 resources configured

Online: [ ]

Full list of resources:

SQUID-VIP (ocf:IPaddr2): Started

FPX-Squid (ocf:Squid): Started

Failed Actions:

FPX-Squid_monitor_10000 on prdloyfpxl01.keylanehosting.local ‘unknown error’ (1): call=17, status=complete, exitreason=’’,

last-rc-change=‘Mon Nov 12 18:10:46 2018’, queued=0ms, exec=0ms

Check_mk output

···

<<<heartbeat_crm>>>

Stack: cman

Current DC: <host2> (version 1.1.18-3.el6-bfe4e80420) - partition with quorum

Last updated: Wed Nov 14 16:23:44 2018

Last change: Fri Oct  5 19:20:53 2018 by root via crm_attribute on <host2>

2 nodes configured

2 resources configured

Online: [ ]

Full list of resources:

FPX-VIP (ocf:IPaddr2): Started

FPX-Squid       (ocf::heartbeat:Squid): Started <host1>

Failed Actions:

* FPX-Squid_monitor_10000 on prekeyfpxl01.mgmt.keylanehosting.local 'not running' (7): call=13, status=complete, exitreason='',

_ last-rc-change=‘Wed Nov 14 16:22:35 2018’, queued=0ms, exec=0ms

Improvement

Pre-create check for when failed action occur, output will be parsed in this check.

To reproduce

Use following command in pcs cluster. Our test cluster run’s squid.

Command > sudo killall squid

Re-run check_mk_agent output

Stefan Baas
Infrastructure Specialist
Keylane

T +31 88 404 59 26

M +31 6 465 216 07

E
stefan.baas@keylane.com

www.keylane.com

sebkir · November 12, 2020, 8:26am

Due to the fact, that someone mentioned this topic in a support ticket, I will also publish the answer to this question here.

If you want the check heartbeat_crm.resources to be able to become critical, you have to nail it down during discovery.

Please do the following now:

While the pcs resources are enabled:

Create a rule from the rule set Heartbeat CRM Discovery. In the rule set select Mark the nodes of the resources as preferred one. Limit this rule the appripriate host(s).
Run a discovery on the host(s).

If you would disable the pcs resource now (pcs resource disable ClusterIP) the check would go critical.