Monitoring AWS ELB fails check and occasional 'caching of services'

Hey all,
We are monitoring AWS accounts through CheckMK raw edition (docker) and have had no issues up till now.

We have a new account which has ~ 50 Load Balancers in it and it seems to be causing an issue when doing ELBv2 monitoring.

This individual ‘host’ also seems to have some caching issues, if I create a Rule to collect information on EC2 in one region only, on initial setup, it looks good, if I then go and monitor S3, then remove that and back to EC2 only I can only see the original S3 ‘services’ - even with doing a full service discovery.

If I go into the CLI I get the correct results though, I can see ec2 services when using --no-cache.

Now the ELBv2 issue on this account if I select ELBv2 only and run a scan I get no results, no errors nothing ( same if monitoring just dynamodb or wafv2 ).

If I go into the CLI and run the following:

cmk --debug --no-cache -vv -vII host-name

I then get the following results:

OMD[cmk]:~$ cmk --debug --no-cache -vv -vII host-name
Discovering services and host labels on: host-name
host-name:
+ FETCHING DATA
  Source: SourceType.HOST/FetcherType.PROGRAM
[cpu_tracking] Start [7f17bbb55f40]
Calling: /omd/sites/cmk/share/check_mk/agents/special/agent_aws '--regions' 'ap-southeast-2' '--services' 'elbv2' '--hostname' 'host-name'
STDIN (first 30 bytes): {"access_key_id": "redacted... (total 106 bytes)
[ProgramFetcher] Fetch with cache settings: DefaultAgentFileCache(base_path=PosixPath('/omd/sites/cmk/tmp/check_mk/data_source_cache/special_aws/host-name'), max_age=MaxAge(checking=0, discovery=120, inventory=120), disabled=True, use_outdated=False, simulation=False)
Not using cache (Cache usage disabled)
[ProgramFetcher] Execute data source
[cpu_tracking] Stop [7f17bbb55f40 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.57, children_system=0.2, elapsed=1.8100000005215406))]
Trying to acquire lock on /omd/sites/cmk/var/check_mk/crashes/base/5d096d68-50a8-11ec-b5e8-0242ac110002/crash.info
Got lock on /omd/sites/cmk/var/check_mk/crashes/base/5d096d68-50a8-11ec-b5e8-0242ac110002/crash.info
Releasing lock on /omd/sites/cmk/var/check_mk/crashes/base/5d096d68-50a8-11ec-b5e8-0242ac110002/crash.info
Released lock on /omd/sites/cmk/var/check_mk/crashes/base/5d096d68-50a8-11ec-b5e8-0242ac110002/crash.info
Traceback (most recent call last):
  File "/omd/sites/cmk/bin/cmk", line 92, in <module>
    exit_status = modes.call(mode_name, mode_args, opts, args)
  File "/omd/sites/cmk/lib/python3/cmk/base/modes/__init__.py", line 69, in call
    return handler(*handler_args)
  File "/omd/sites/cmk/lib/python3/cmk/base/modes/check_mk.py", line 1542, in mode_discover
    discovery.do_discovery(
  File "/omd/sites/cmk/lib/python3/cmk/base/discovery.py", line 370, in do_discovery
    fetcher_messages=list(
  File "/omd/sites/cmk/lib/python3/cmk/base/checkers/_checkers.py", line 247, in fetch_all
    raw_data = source.fetch()
  File "/omd/sites/cmk/lib/python3/cmk/base/checkers/_abstract.py", line 163, in fetch
    return fetcher.fetch(self.mode)
  File "/omd/sites/cmk/lib/python3/cmk/fetchers/_base.py", line 259, in fetch
    return result.OK(self._fetch(mode))
  File "/omd/sites/cmk/lib/python3/cmk/fetchers/_base.py", line 277, in _fetch
    raw_data = self._fetch_from_io(mode)
  File "/omd/sites/cmk/lib/python3/cmk/fetchers/program.py", line 133, in _fetch_from_io
    raise MKFetcherError("Agent exited with code %d: %s" %
cmk.utils.exceptions.MKFetcherError: Agent exited with code 1: 
OMD[cmk]:~$ 

Has anyone come across this before (both the caching issue and ELBv2 issues?)
I have also found monitoring dynamodb on this account produces the same result… the results that I have confirmed working are:

  • S3
  • EC2
  • EBS
  • Cloudwatch Alarms
  • RDS

and results I have confirmed that aren’t working:

  • ELBv2
  • Dynamodb
  • WAFv2

It isn’t role based as I have a common deployment across all accounts (and performing AWS CLI commands to list load balancers etc works), and this isn’t an issue in any other accounts my only thinking is the number of results?

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.