CheckMK 2.1 - Monitoring Windows Fileserver Cluster - "check failed - please submit a crash report"

OgiOgi · February 17, 2023, 4:18pm

Hi.

Has anyone seen this error before when configuring monitoring of Windows Fileserver clusters?
Monitoring of SQL clusters with the MSSQL plugin works fine, but when monitoring the Filesystem service the monitoring blows up.

The cluster is configured as a cluster host with the two nodes:

I have applied rules to the hosts to configure the two Filesystem services as the cluster:

I have configured the Aggregation options for clustered services (which is not referenced in the CheckMK guide and example Monitoring cluster services):

If aggregation rule is not set I get this message:

The Crash Report shows…
Exception: IndexError (list index out of range)
Traceback:

File "/omd/sites/sysmon_slave_ocs/lib/python3/cmk/base/agent_based/checking/__init__.py", line 470, in get_aggregated_result
    result = _aggregate_results(
  File "/omd/sites/sysmon_slave_ocs/lib/python3/cmk/base/agent_based/checking/__init__.py", line 578, in _aggregate_results
    perfdata, results = _consume_and_dispatch_result_types(subresults)
  File "/omd/sites/sysmon_slave_ocs/lib/python3/cmk/base/agent_based/checking/__init__.py", line 622, in _consume_and_dispatch_result_types
    for subr in subresults:
  File "/omd/sites/sysmon_slave_ocs/lib/python3/cmk/base/agent_based/checking/_cluster_modes.py", line 134, in _cluster_check
    node_results=executor(check_function, cluster_kwargs),
  File "/omd/sites/sysmon_slave_ocs/lib/python3/cmk/base/agent_based/checking/_cluster_modes.py", line 282, in __call__
    elements = self._consume_checkresult(
  File "/omd/sites/sysmon_slave_ocs/lib/python3/cmk/base/agent_based/checking/_cluster_modes.py", line 327, in _consume_checkresult
    return list(result_generator)
  File "/omd/sites/sysmon_slave_ocs/lib/python3/cmk/base/api/agent_based/register/check_plugins.py", line 94, in filtered_generator
    for element in generator(*args, **kwargs):
  File "/omd/sites/sysmon_slave_ocs/lib/python3/cmk/base/api/agent_based/register/check_plugins_legacy.py", line 184, in check_result_generator
    subresults = list(subresults)
  File "/omd/sites/sysmon_slave_ocs/share/check_mk/checks/df", line 236, in check_df
    volume_name = [d.device for d in df_blocks if d.mountpoint == item][0]

Local Variables:

{'df_blocks': [('C:\\',
                'NTFS',
                81368.99609375,
                49679.19921875,
                0.0,
                'C:/',
                None)],
 'df_inodes': (),
 'item': 'D:/',
 'item_and_grouping': ('mountpoint',
                       'mountpoint',
                       <ItemBehaviour.default: 1>,
                       <ItemBehaviour.default: 1>),
 'mountpoint_for_block_devices': <ItemBehaviour.volume_name: 2>,
 'params': {'inodes_levels': (10.0, 5.0),
            'item_appearance': 'mountpoint',
            'levels': (96.0, 98.0),
            'levels_low': (50.0, 60.0),
            'magic': 0.8,
            'magic_normsize': 20,
            'mountpoint_for_block_devices': 'volume_name',
            'show_inodes': 'onlow',
            'show_levels': 'onmagic',
            'show_reserved': False,
            'show_volume_name': True,
            'trend_perfdata': True,
            'trend_range': 336,
            'trend_showtimeleft': True,
            'trend_timeleft': (96, 72)},
 'parsed': ((('C:\\',
              'NTFS',
              81368.99609375,
              49679.19921875,
              0.0,
              'C:/',
              None),),
            ()),
 'raw_df_blocks': [('C:/', 81368.99609375, 49679.19921875, 0.0)],
 'raw_df_inodes': []}

I have tried it against multiple fileserver clusters with the same error.

If anyone has any thoughts on how I can get this to work I would be very grateful.
Looking at the CheckMK guide it should be straight forward but I have managed to mess it up somehow.

albzundy · November 15, 2023, 4:45pm

Thank you very much for the iformation about the aggregation options rule!

Now it works.

Only difernce I can see in my configuration is in the conditions in die clustered services rule

Maybe there are too much Services (cause you didn’t defined Hosts in the rule) which cannot be aggregated at your Cluster-Host?

And i didn’t defined a preferred node, I just choose the type “Failover” in the aggregation rule but I don’t think thats important …

system · November 14, 2024, 4:45pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.