Mk_ceph - I just do not understand how

Following the problem here: Proxmox mit Ceph I am in the same position as original poster.
The plugin is in the directory but I see no output in Check_MK.
I run it as sh ./mk_ceph.txt and there is no output.

I suspect that I have not followed this instruction as I really cannot figure out what it is asking me to do:

# Check Ceph storage
# Config file must contain:
# USER=client.admin
# KEYRING=/etc/ceph/ceph.client.admin.keyring

Can someone re-word this comment in another way please?

As a first step, you can try running the commands manually as being called by the agent plugin:

Hi, thanks.

Background: Ceph is running great on a 3-node Proxmox Cluster and the Proxmox OMD checks are running great (all green).

Iā€™m trying to add the CEPH plugin but am just missing some fundamentals about HOW to do so. I have copied it to the plugins directory as outlined in the README but no output happens when I refresh services for this server:

These plugins can be installed in the plugins directory of the Linux agent
in /usr/lib/check_mk_agent/plugins/. Please only install the plugins that
you really need.

The VARS are clearly incorrect:

$ echo $USER
*logged_in_username*
$ echo $KEYRING
*nothing returned*

Running the ceph command with bogus data (because I donā€™t know what the real data should be) returns:

ceph -n 'logged_in_username' --keyring='made_up_key"
Error initializing cluster client: Error('rados_initialize failed with error code: -22')

My question is more about the config file which:

# Check Ceph storage
# Config file must contain:
# USER=client.admin
# KEYRING=/etc/ceph/ceph.client.admin.keyring

I have two ceph.conf files:

user@server:$ find / -name ceph.conf 2>/dev/null
/usr/lib/tmpfiles.d/ceph.conf
/etc/ceph/ceph.conf
/etc/pve/ceph.conf
user@server:$

Both of which contain the same data and do not have a USER entry. Both of which do contain a KEYRING entry:

[client]
	 keyring = /etc/pve/priv/$cluster.$name.keyring

My GUESS is that I should add a ā€œUSER = client.adminā€ into one - or both - of these ceph.conf files. However I am unclear if client.admin is meant to be entered exactly like that, or if it refers to some other username that I ought to know about - but do not.

What do you have under /etc/pve/priv ?
Our mk_ceph agent plugin wonā€™t understand these variables.

I am not an expert on this topic. Maybe this User Management ā€” Ceph Documentation helps ?

Iā€™ve been using mk_ceph with Proxmox without any issues prior to upgrading to 2.2ā€”it hadnā€™t been working since the upgrade.

Running the plugin script directly didnā€™t give any output. When running the command manually, Iā€™m getting an output.

ceph -n client.admin --keyring=/etc/pve/priv/ceph.client.admin.keyring -s -f json-pretty

Just found the solution and wanted to share:
There seems to be a problem with the variable $MK_CONFDIR, see

Iā€™ve changed that line to . "/etc/check_mk/ceph.cfg" 2>/dev/null and it works again. Maybe $MK_CONFDIR has been changed in v2.2?

1 Like

So, the Checkmk agent cannot call the mk_ceph plugin at all ?
How are you calling the agent plugin ?

@herzkerl the environment variable $MK_CONFDIR is set by the agent on initialization, so all plugins run by the agent can use it. If you run the script manually, you need to set the variable manually first. The default value is /etc/check_mk/.

Your change just works around the missing environment variable, which is not a fix or solution.

There is a package from the exchange that I found.
Maybe it could be useful for you?

I only upgraded both our server and the agents to 2.2.xā€”I didnā€™t change any other settings. Why would the $MK_CONFDIR variable work prior to the upgrade, but stop working when having upgraded?

I installed that package a while ago, but I donā€™t know what to do next, to be honest. I just read the documentation again and couldnā€™t find an answer to it. Could you please point me in the right direction? :slight_smile:

I think we can ask @r.sander for some advice on this :slight_smile:

The extension comes with an agent plugin. You either deploy that via agent bakery or manually by copying it from $OMD_ROOT/local/share/agents/plugins to /usr/lib/check_mk_agent/plugins/60 on your Ceph nodes.

The agent plugin currently has no configuration and requires python3-rados to be installed. It also needs /etc/ceph/ceph.conf and /etc/ceph/ceph.client.admin.keyring. This will be configurable in the future.

2 Likes

Good news: I got it working :slight_smile: Thank you very much for explaining, @r.sander!

One small issue remains: When the plugin is active Checkmk warns that there are duplicate checks now.

But once I deactivate the plugin, the following Checks are stale:
ā€¢ Ceph OSDs
ā€¢ Ceph PGs
ā€¢ Ceph Pool SUMMARY

You should only have one agent plugin active, either mk_ceph or ours.

1 Like

Thatā€™s what I thought, but yours lacks the three Checks I mentioned above. While I donā€™t think ā€˜Ceph Pool SUMMARYā€™ has any useful information, the other two do:

  • Ceph OSDs: Epoch rate (15 minutes 0 seconds average): 0.00, OSDs: XX, Remapped PGs: 0, OSDs out: 0, 0%, OSDs down: 0, 0%
  • Ceph PGs: PGs: XXX, Status 'active+clean': XXX, Status 'active+clean+scrubbing+deep': X

Is there any way to keep only those two checks? Or maybe you want to put them in your plugin with the next releaseā€¦ :slight_smile:

The PGs are in the ā€œCeph Statusā€ check. I have to look into the code to see what the other check does.

1 Like

hi,

im not able to get it working on all 3 Nodes of my Cluster :stuck_out_tongue:
On Node1 it is running and i get
CEPH-OSDs / PGs / Pool.mgr / Status

on the other Nodes, i get following error:

ā€œWARNING: Parsing of section ceph_status failed - please submit a crash report! (Crash-ID: a3ae09a2-5224-11ee-9de7-92250e8ac011)ā€

can you guys maybee help me out?

Thanks,
BR

1 Like

Hi,

i have the same problem. Did you find a solution?

yes, i had to put in the keyrings to the other nodes and then it worked

Thanks! Iā€™m using /etc/pve/priv/ceph.client.admin.keyring now instead of /etc/ceph/ceph.client.admin.keyringand now it works for all Hosts.