Graph recipe uses undefined metric

Dear CheckMK Community,

I wrote a local check for monitoring Redis Memory Usage, Keys and Keyspace hits/misses.

Check output

0 Redis-Memory mem_total=2122317824|mem_used=51508440|keyspace_hits=384685|keyspace_misses=44320|keyspace_hitmiss_ratio(%)=89.67|db2(KEYS)=21260 Memory Usage is: 49.12 MB / 1.98 GB

The check itself is working, but the Graphs are not (at least not anymore).
I don’t know the exact point in time when the graphs stopped working, but I assume it has to do something with the upgrade to CheckMK v2.0.0

As far as I can see, the output is okay, but maybe I’m missing something here.

Error on Graphs

Graph recipe 'keyspace_hitmiss_ratio(%)' uses undefined metric 'max', available are: db2(KEYS), keyspace_hitmiss_ratio, keyspace_hitmiss_ratio(%), keyspace_hits, keyspace_misses, mem_total, mem_used

My code does not have a metric called ‘max’.
This error is the same across all Redis-Memory local check instances on all servers.

Setup Info
CheckMK Server 2.0.0p1 (CEE) on Ubuntu 20.04
Monitored Servers: Ubuntu 16.04 - Ubuntu 20.04 + Debian 10 (Buster)

Maybe some of you will see the problem I am obviously missing.
I substituted keyspace_hitmiss_ratio(%) with keyspace_hitmiss_ratio because I thought maybe the (%) is wrong since the new version, but that did not change anything.

Thanks for reading and trying to help :slight_smile:

Greetings,
Pixelpoint

I would avoid all special characters inside the performance value names. Also the ( and ) are problematic.
It is not enough to rewrite your check you have also to remove the old performance data and the config files for this performance data.

Thank you for your answer.

How do I delete old performance data and config files for these?

I looked at the output of cmk --paths and tried deleting some files/folders with the corresponding HOSTNAME, but the check still shows the same error.

It still shows Graph recipe 'keyspace_hitmiss_ratio(%)' uses undefined metric even though the check does not use (%) anymore.

I even did a for file in $(find -name HOSTNAME); do rm -r $file; done in the homedirectory of our site user.

The location of the performance data depends on the used version.
Take a look inside “~/var/pnp4nagios/perfdata” and “~/var/check_mk/rrd”
There inside the folder with the hostname of your local checks you find some files with the name of your check.Move these files somewhere else and restart the rrdcached and the monitoring core.
Now look if there are new files created.
Important is also what you see on your local check in the column “Service performance data (source code)”

So, I did omd stop.
Afterwards, folder ~/var/pnp4nagios/perfdata is empty.
Then, folder ~/var/check_mk/rrd/HOSTNAME has been deleted.

At first the check states that no historic metric records have been found.
Screenshot_20210401_122806

After that, it still states the Graph recipe error.

Check Output

0 Redis-Memory mem_total=536870912|mem_used=1708352|keyspace_hits=2142524|keyspace_misses=42201|keyspace_hitmiss_ratio=98.07|db1(KEYS)=26|db2(KEYS)=225|db3(KEYS)=357 Memory Usage is: 1.63 MB / 512.0 MB

Service Performance Data (source code)

So it seems CheckMK has to have another file somewhere, storing the metrics output of local checks.

find -name HOSTNAME

./var/check_mk/logwatch/basic.datacycle.at-srv
./var/check_mk/rrd/basic.datacycle.at-srv
./tmp/check_mk/status_data/basic.datacycle.at-srv
./tmp/check_mk/piggyback_sources/basic.datacycle.at-srv
./tmp/check_mk/counters/basic.datacycle.at-srv
./tmp/check_mk/cache/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-xqtqv4pf-project-918-concurrent-0-a4ad2a852187a5e3-mvertes__alpine-mongo-1-wait-for-service/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-xqtqv4pf-project-618-concurrent-0-7c2b6bc0f9472d70-mvertes__alpine-mongo-1-wait-for-service/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-xqtqv4pf-project-590-concurrent-0-fa1bc7286807acc3-predefined-2/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-xqtqv4pf-project-590-concurrent-0-fa1bc7286807acc3-predefined-1/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-xqtqv4pf-project-590-concurrent-0-fa1bc7286807acc3-predefined-0/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-xqtqv4pf-project-590-concurrent-0-fa1bc7286807acc3-mvertes__alpine-mongo-1/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-xqtqv4pf-project-590-concurrent-0-fa1bc7286807acc3-git.pixelpoint.biz__data-cycle__data-cycle-core__postgres-0/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-xqtqv4pf-project-590-concurrent-0-fa1bc7286807acc3-build-3/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-xqtqv4pf-project-590-concurrent-0-834283628ea627d9-mvertes__alpine-mongo-1-wait-for-service/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-tyhmz2c4-project-547-concurrent-0-91cd9eda81daece2-predefined-2/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-tyhmz2c4-project-547-concurrent-0-91cd9eda81daece2-predefined-1/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-tyhmz2c4-project-547-concurrent-0-91cd9eda81daece2-predefined-0/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-tyhmz2c4-project-547-concurrent-0-91cd9eda81daece2-mvertes__alpine-mongo-1/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-tyhmz2c4-project-547-concurrent-0-91cd9eda81daece2-git.pixelpoint.biz__data-cycle__data-cycle-core__postgres-0/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-tyhmz2c4-project-547-concurrent-0-91cd9eda81daece2-build-3/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-jdb-wfp2-project-881-concurrent-0-4fc06cf83274a4d6-mvertes__alpine-mongo-1-wait-for-service/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-jdb-wfp2-project-618-concurrent-0-63083cbb25f3d50c-mvertes__alpine-mongo-1-wait-for-service/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-jdb-wfp2-project-546-concurrent-0-e864c6580640e44e-mvertes__alpine-mongo-1-wait-for-service/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-aj3sufzd-project-546-concurrent-0-58a7598d08682648-predefined-2/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-aj3sufzd-project-546-concurrent-0-58a7598d08682648-predefined-1/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-aj3sufzd-project-546-concurrent-0-58a7598d08682648-predefined-0/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-aj3sufzd-project-546-concurrent-0-58a7598d08682648-mvertes__alpine-mongo-1/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-aj3sufzd-project-546-concurrent-0-58a7598d08682648-git.pixelpoint.biz__data-cycle__data-cycle-core__postgres-0/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-8sztyl9-project-547-concurrent-0-e93874bd50a50705-mvertes__alpine-mongo-1-wait-for-service/basic.datacycle.at-srv
./tmp/check_mk/piggyback/runner-8sztyl9-project-547-concurrent-0-ce457e9286d246bb-mvertes__alpine-mongo-1-wait-for-service/basic.datacycle.at-srv
./tmp/check_mk/piggyback/peaceful_banach/basic.datacycle.at-srv
./tmp/check_mk/piggyback/imaginary/basic.datacycle.at-srv
./tmp/check_mk/piggyback/funny_dijkstra/basic.datacycle.at-srv

All of these have been deleted repeatedly via
for file in $(find -name HOSTNAME); do rm -r $file; done

Greetings,
Pixelpoint

Look at your service performance data (source code) there the “(%)” is existing.
Why?
Your posted check output is not what you see in your performance data. Is it possible that your machine uses the old script?

Do you mean the monitored Host?
No, this one cannot use the old data.

The Check Output is copied directly from the command line on the monitored Host.

I manually deleted the Local Check and copied over the new one before doing any of the commands posted above.

[EDIT]
The new check does not have any other changes except for renaming keyspace_hitmiss_ratio

Then please post the “Service performance data (source code)” how it looks now.
The last one cannot work.

This is a side-by-side screenshot.

Left side
Monitored Host executing pwd to show I’m in the CheckMK directory + executing CheckMK Local Check Redis + Output

Marked RED is output of pwd + Local Check Redis + (specifically) metric keyspace_hitmiss_ratio

Right side
CheckMK Fullscreen output of CheckMK Redis in WebUI.

Marked RED is Graph error + Service performance data (source code)

Thank you for helping,
Pixelpoint

Relevant is only what you get if the agent is queried from the monitoring system.
If the local section of your agent output also shows the same as on the left side, then i don’t know what it could be.
The right side is not the same script as on the left.
Is it possible that you have some definition for metrics in your system who rewrite the metric name?
In the default system there is no metric with keyspace so it must come from some own written extensions if this is the cause. But i don’t think so.
For this you can grep over the local directory and search for keyspace.

CheckMK Server

cmk -npv --no-cache --detect-plugins=local HOSTNAME

Redis-Memory         Memory Usage is: 1.63 MB / 512.0 MB                      (mem_total=536870912;;;; mem_used=1708352;;;; keyspace_hits=2142524;;;; keyspace_misses=42201;;;; keyspace_hitmiss_ratio(%)=98.07;;;; db1(KEYS)=26;;;; db2(KEYS)=225;;;; db3(KEYS)=357;;;;)

Monitored Host CheckMK Agent Output

check_mk_agent -v | grep -A3 "<<<local"

<<<local:sep(0)>>>
0 "Proxmox Backup Client" fs_used=21090532658|fs_growth=41.9|fs_trend=41.9 OK 
0 Redis-Memory mem_total=536870912|mem_used=1708352|keyspace_hits=2142524|keyspace_misses=42201|keyspace_hitmiss_ratio=98.07|db1(KEYS)=26|db2(KEYS)=225|db3(KEYS)=357 Memory Usage is: 1.63 MB / 512.0 MB

So the CheckMK Agent on the monitored host IS reporting back keyspace_hitmiss_ratio but the CheckMK Server lists this as keyspace_hitmiss_ratio(%)

I do not have any rewrite / translate rules for metrics in place.
All my local check does is gather Information from redis and output metrics while giving them names which should clarify what they are meant to be.

The keyspace metric does not come from an extension, it’s just the name my local check assigns to the information gathered.

GREP keyspace (on CheckMK Server in homedir of site user)
I seperated stuff that belongs together in the output with a newline.

Info:
I write Checks, push them to git and symlink them to the appropriate directory, so the CheckMK Agent Bakery always got the up to date version of a local check if I issue git fetch && git pull before baking the agent.

grep -R keyspace ./*

Binary file ./bin/redis-cli matches
Binary file ./bin/redis-server matches
Binary file ./bin/redis-benchmark matches
Binary file ./bin/redis-sentinel matches
Binary file ./bin/redis-check-aof matches
Binary file ./bin/redis-check-rdb matches

./git/checkmk-local-checks/check_redis.py:        self.keyspace_hits = 0
./git/checkmk-local-checks/check_redis.py:        self.keyspace_misses = 0
./git/checkmk-local-checks/check_redis.py:            elif 'keyspace_hits:' in line:
./git/checkmk-local-checks/check_redis.py:                self.keyspace_hits = int(line.split(':')[1])
./git/checkmk-local-checks/check_redis.py:            elif 'keyspace_misses:' in line:
./git/checkmk-local-checks/check_redis.py:                self.keyspace_misses = int(line.split(':')[1])
./git/checkmk-local-checks/check_redis.py:    def get_keyspace_hits(self):
./git/checkmk-local-checks/check_redis.py:        """ returns keyspace_hits info from redis-cli """
./git/checkmk-local-checks/check_redis.py:        return self.keyspace_hits
./git/checkmk-local-checks/check_redis.py:    def get_keyspace_misses(self):
./git/checkmk-local-checks/check_redis.py:        """ returns keyspace_misses info from redis-cli """
./git/checkmk-local-checks/check_redis.py:        return self.keyspace_misses
./git/checkmk-local-checks/check_redis.py:    # add metrics for keyspace (hits, misses, percentage)
./git/checkmk-local-checks/check_redis.py:    service_output += cmk_service['metric_sep']+""+cmk_service['metric_keyspace_hits_prefix']+""+str(cmk_service['keyspace_hits'])
./git/checkmk-local-checks/check_redis.py:    service_output += cmk_service['metric_sep']+""+cmk_service['metric_keyspace_misses_prefix']+""+str(cmk_service['keyspace_misses'])
./git/checkmk-local-checks/check_redis.py:    hitmissratio = str(round(cmk_service['keyspace_hits']/(cmk_service['keyspace_hits']+cmk_service['keyspace_misses'])*100,2))
./git/checkmk-local-checks/check_redis.py:    # add metrics for keyspace per db (entries)
./git/checkmk-local-checks/check_redis.py:    'metric_keyspace_hits_prefix': 'keyspace_hits=',
./git/checkmk-local-checks/check_redis.py:    'keyspace_hits': 0,
./git/checkmk-local-checks/check_redis.py:    'metric_keyspace_misses_prefix': 'keyspace_misses=',
./git/checkmk-local-checks/check_redis.py:    'keyspace_misses': 0,
./git/checkmk-local-checks/check_redis.py:    'metric_hitmiss_ratio_prefix': 'keyspace_hitmiss_ratio(%)=',
./git/checkmk-local-checks/check_redis.py:    cmk_service.update({'keyspace_hits': redis.get_keyspace_hits()})
./git/checkmk-local-checks/check_redis.py:    cmk_service.update({'keyspace_misses': redis.get_keyspace_misses()})

./lib/python3/botocore/data/elasticache/2015-02-02/service-2.json:      "documentation":"<p>Modifies a replication group's shards (node groups) by allowing you to add shards, remove shards, or rebalance the keyspaces among exisiting shards.</p>"
./lib/python3/botocore/data/elasticache/2015-02-02/service-2.json:          "documentation":"<p>The keyspace for this node group (shard).</p>"
./lib/python3/botocore/data/elasticache/2015-02-02/service-2.json:          "documentation":"<p>A string that specifies the keyspace for a particular node group. Keyspaces range from 0 to 16,383. The string is in the format <code>startkey-endkey</code>.</p> <p>Example: <code>\"0-3999\"</code> </p>"

grep: ./local/share/nagios/htdocs/theme/stylesheets: No such file or directory
grep: ./local/share/nagios/htdocs/theme/images: No such file or directory

./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:        self.keyspace_hits = 0
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:        self.keyspace_misses = 0
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:            elif 'keyspace_hits:' in line:
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:                self.keyspace_hits = int(line.split(':')[1])
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:            elif 'keyspace_misses:' in line:
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:                self.keyspace_misses = int(line.split(':')[1])
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:    def get_keyspace_hits(self):
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:        """ returns keyspace_hits info from redis-cli """
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:        return self.keyspace_hits
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:    def get_keyspace_misses(self):
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:        """ returns keyspace_misses info from redis-cli """
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:        return self.keyspace_misses
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:    # add metrics for keyspace (hits, misses, percentage)
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:    service_output += cmk_service['metric_sep']+""+cmk_service['metric_keyspace_hits_prefix']+""+str(cmk_service['keyspace_hits'])
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:    service_output += cmk_service['metric_sep']+""+cmk_service['metric_keyspace_misses_prefix']+""+str(cmk_service['keyspace_misses'])
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:    hitmissratio = str(round(cmk_service['keyspace_hits']/(cmk_service['keyspace_hits']+cmk_service['keyspace_misses'])*100,2))
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:    # add metrics for keyspace per db (entries)
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:    'metric_keyspace_hits_prefix': 'keyspace_hits=',
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:    'keyspace_hits': 0,
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:    'metric_keyspace_misses_prefix': 'keyspace_misses=',
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:    'keyspace_misses': 0,
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:    'metric_hitmiss_ratio_prefix': 'keyspace_hitmiss_ratio(%)=',
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:    cmk_service.update({'keyspace_hits': redis.get_keyspace_hits()})
./local/share/check_mk/agents/custom/check_redis/lib/local/check_redis.py:    cmk_service.update({'keyspace_misses': redis.get_keyspace_misses()})

./share/snmpsim/variation/redis.py:    if ('current-keyspace' not in recordContext or
./share/snmpsim/variation/redis.py:            recordContext['current-keyspace'] != keySpace):
./share/snmpsim/variation/redis.py:        log.msg('redis: now using keyspace %s (cycling period %s)' % (keySpace, recordContext['settings']['period'] or '<disabled>'))
./share/snmpsim/variation/redis.py:        recordContext['current-keyspace'] = keySpace

grep: ./tmp/run/redis: No such device or address
grep: ./tmp/run/dcd.sock: No such device or address

Thanks for your time,
Pixelpoint

Only go to the folder “~/tmp/check_mk/cache/” get the file with the HOSTNAME of your problem host.
What is there inside is the relevant information. Or use “cmk -d HOSTNAME”.

And for the other metrics please avoid special characters also brackets.

Content of ~/tmp/check_mk/cache/HOSTNAME
At least the relevant parts.

0 Redis-Memory mem_total=536870912|mem_used=1708352|keyspace_hits=2142524|keyspace_misses=42201|keyspace_hitmiss_ratio=98.07|db1(KEYS)=26|db2(KEYS)=225|db3(KEYS)=357 Memory Usage is: 1.63 MB / 512.0 MB
cmk -d HOSTNAME

0 Redis-Memory mem_total=536870912|mem_used=1708352|keyspace_hits=2142524|keyspace_misses=42201|keyspace_hitmiss_ratio=98.07|db1(KEYS)=26|db2(KEYS)=225|db3(KEYS)=357 Memory Usage is: 1.63 MB / 512.0 MB
0 Redis-Memory mem_total=536870912|mem_used=1708352|keyspace_hits=2142524|keyspace_misses=42201|keyspace_hitmiss_ratio(%)=98.07|db1(KEYS)=26|db2(KEYS)=225|db3(KEYS)=357 Memory Usage is: 1.63 MB / 512.0 MB

Yes, it’s in there 2 times.
Once it’s correct and up to date, once it’s the outdated, old version.
But why?

Yes, I will avoid any special characters.
Is underscore _ ok?

[EDIT]
Until now I manually overwrote the Local Check on the monitored host.
I did not want to bake an agent with an untested fix.
Now I updated the Agent and deployed it.
Don’t know if this information is of any significance or not.

Thank you for your time,
Pixelpoint

Yes

The cmk -d HOSTNAME looks like two sources for the same information.
Is there something written at the beginning about piggyback data from any source?

Are booth lines just right behind the other one or in different parts of the output?

There is no mention about piggyback in the output of cmk -d HOSTNAME

These lines are very close to each other, only seperated by the UFW Local Check and LetsEncrypt Certificate Local Check Output.
The first Redis-Memory line is correct (no (%) in output).
The second Redis-Memory line is incorrect (with (%) in output).

0 Redis-Memory mem_total=536870912|mem_used=1708352|keyspace_hits=2142524|keyspace_misses=42201|keyspace_hitmiss_ratio=98.07|db1(KEYS)=26|db2(KEYS)=225|db3(KEYS)=357 Memory Usage is: 1.63 MB / 512.0 MB
0 UFW - running\\nALLOW any any => any 22/tcp\\nALLOW any any => any 80/tcp\\nALLOW any any => any 443/tcp\\nALLOW SOME.IP.ADD.RESS/28 any => any 6556/tcp\\n\\n\\nLogging: on (low)\\nDefault: deny (incoming), allow (outgoing), deny (routed)\\nNew profiles: skip\\n
cached(1617287856,86400) 0 LetsEncrypt-Certs - All certs valid\\nCERT (57 days valid)\\nCERT (57 days valid)\\nCERT (57 days valid)\\nCERT (43 days valid)\\n
0 Redis-Memory mem_total=536870912|mem_used=1708352|keyspace_hits=2142524|keyspace_misses=42201|keyspace_hitmiss_ratio(%)=98.07|db1(KEYS)=26|db2(KEYS)=225|db3(KEYS)=357 Memory Usage is: 1.63 MB / 512.0 MB

Thank you for your time :slight_smile:

Greetings,
Pixelpoint

Then you have a second script running on this machine what produces this output.

1 Like

You are absolutely correct.
It seems, some unspecified time ago, I copied the local check manually.
But instead of copying it to /usr/lib/check_mk_agent/local I copied it to /usr/lib/check_mk_agent/plugins
:open_mouth:

So I had the same local check (unmodified, still using metric keyspace_hitmiss_ratio(%)) in the plugins directory.

So what I did to get it working again (on a single host):

ssh SITEUSER@CHECKMK_SERVER_ADDRESS
omd stop
for file in $(find -name HOSTNAME); do rm -r $file; done
omd start

This will delete all the files and folders containing the monitored hosts data.
After this the graphs started working again.

If you (like me) deployed your local check on multiple hosts and none of them show graphs, do this:

ssh SITEUSER@CHECKMK_SERVER_ADDRESS
omd stop
for file in $(find -name "SERVICE_NAME.*"); do rm $file; done

This will only delete RRD and INFO files for the service in question.
In my case, the service name is Redis-Memory and the above command will find all files named Redis-Memory.rrd and Redis-Memory.info across all monitored hosts and delete them.

There are 2 lessons learned here:

  1. NEVER use special characters in local check output, even if it works right now. An update might break this
  2. Before copying stuff manually, look at where you’re copying it.

Thank you so much.

Greetings,
Pixelpoint

You can use special characters but not as performance data names. Inside the normal text output it is no problem.

Oh yes, that’s what I meant to say :slight_smile:

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact @fayepal if you think this should be re-opened.