[Check_mk (english)] disk usage off by a wide margin

Brain_Slug · March 7, 2016, 7:46pm

Hi all,

after upgrading CheckMK from 1.2.2p2 to 1.2.6p15 we've noticed that
disk usage reporting seems to be way off.

Example:

This is what "df (-h)" reports:

testhost2 ~ $ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 254941748 3985516 237999204 2% /
tmpfs 16414592 0 16414592 0% /dev/shm
/dev/sda1 495844 63149 407095 14% /boot

testhost2 ~ $ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 244G 3.9G 227G 2% /
tmpfs 16G 0 16G 0% /dev/shm
/dev/sda1 485M 62M 398M 14% /boot

Here's what the check_mk_agent reports to the server:

<<<df>>>
/dev/sda2 ext4 254941748 3985200 237999520 2% /
tmpfs tmpfs 16414592 0 16414592 0% /dev/shm
/dev/sda1 ext3 495844 63149 407095 14% /boot
<<<df>>>

And this is what the Nagios server reports on the web site and in the
pnp4nagios graphs:

fs_/ OK 6.6% used (*16.16* of 243.13 GB)
fs_/boot OK 17.9% used (*86.67* of 484.22 MB)

This is more than 400%(!) off for fs_/ and a good 25% off for fs_/boot.

The offset seems to be random but significant on every single host we
have in our cluster.

Is anybody else seeing this? Is there anything we need to change in our
configurations to have CheckMK report the right values on the web site
and in the rrd graphs?

Thanks!

MarsellusWallace · March 8, 2016, 7:19am

Hi,

no problems like that here.

I’d install a fresh OMD testsite without any additional configuration, add one of the hosts in question and check if the wrong filesystem sizes show up. If they don’t the problem lies somewheer in your configuration (customized df check or df.include file, custom rules, whatever).

BTW: did you try to reinventorize (“TabulaRasa” in WATO or “cmk -II” in terminal) one of the hosts, rstart and check again?

Regards,

Marcel

···

Brainslug brainslug@freakmail.de schrieb am Mo., 7. März 2016 um 20:51 Uhr:

Hi all,
    after upgrading CheckMK from 1.2.2p2 to 1.2.6p15 we've noticed that
disk usage reporting seems to be way off.

Example:

This is what “df (-h)” reports:

testhost2 ~ $ df

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/sda2 254941748 3985516 237999204 2% /

tmpfs 16414592 0 16414592 0% /dev/shm

/dev/sda1 495844 63149 407095 14% /boot

testhost2 ~ $ df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sda2 244G 3.9G 227G 2% /

tmpfs 16G 0 16G 0% /dev/shm

/dev/sda1 485M 62M 398M 14% /boot

Here’s what the check_mk_agent reports to the server:

<<>>

/dev/sda2 ext4 254941748 3985200 237999520 2% /

tmpfs tmpfs 16414592 0 16414592 0% /dev/shm

/dev/sda1 ext3 495844 63149 407095 14% /boot

<<>>

And this is what the Nagios server reports on the web site and in the

pnp4nagios graphs:

fs_/ OK 6.6% used (16.16 of 243.13 GB)

fs_/boot OK 17.9% used (86.67 of 484.22 MB)

This is more than 400%(!) off for fs_/ and a good 25% off for fs_/boot.

The offset seems to be random but significant on every single host we

have in our cluster.

Is anybody else seeing this? Is there anything we need to change in our

configurations to have CheckMK report the right values on the web site

and in the rrd graphs?

Thanks!

checkmk-en mailing list

checkmk-en@lists.mathias-kettner.de

http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en

Brain_Slug · March 8, 2016, 9:22pm

Hi Marcel,

thanks for the suggestions.

I stood up a brand new test server today. We don't use OMD but install
CMK via "setup.sh" from the sources. Nagios 3.5.0 packages are from EPEL
repositories, otherwise a "naked" Centos 6.3 server.

The only changes I made after the fresh install is add a file
"all_hosts.mk" to /opt/check_mk/etc/conf.d that has two host definitions
in it:

all_hosts += [

"testhost1",
"testhost2",

]

Ran an inventory with -II and started via -O.

Same problem - all my file system reports are way off:

testhost1:

testhost1 conf.d $ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 244G 2.4G 229G 2% /
tmpfs 16G 0 16G 0% /dev/shm
/dev/sda1 485M 62M 398M 14% /boot

WebGui reports:

fs_/ OK 6.0% used (14.70 of 243.13 GB)
fs_/boot OK 17.9% used (86.69 of 484.22 MB)

For testhost2:

testhost2 ~ $ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 244G 3.9G 227G 2% /
tmpfs 16G 0 16G 0% /dev/shm
/dev/sda1 485M 62M 398M 14% /boot

and on the GUI:

fs_/ OK 6.6% used (16.17 of 243.13 GB)
fs_/boot OK 17.9% used (86.67 of 484.22 MB)

I also checked both GUIs, Nagios and CMK's web page, both are reporting
the same values for both hosts, both are wrong.
The Check_Mk agent on the hosts is reporting correct numbers to the
server, though.

Any other ideas as to what might be going wrong here? A fresh install
using 1.2.2p2 works without issues and reports proper file system
numbers. I'm puzzled.

Thanks!

···

On 03/08/2016 01:19 AM, Marcel Schulte wrote:

Hi,

no problems like that here.

I'd install a fresh OMD testsite without any additional configuration,
add one of the hosts in question and check if the wrong filesystem sizes
show up. If they don't the problem lies somewheer in your configuration
(customized df check or df.include file, custom rules, whatever).

BTW: did you try to reinventorize ("TabulaRasa" in WATO or "cmk -II" in
terminal) one of the hosts, rstart and check again?

Regards,
Marcel

Brainslug <brainslug@freakmail.de <mailto:brainslug@freakmail.de>>
schrieb am Mo., 7. März 2016 um 20:51 Uhr:

    Hi all,

            after upgrading CheckMK from 1.2.2p2 to 1.2.6p15 we've
    noticed that
    disk usage reporting seems to be way off.

    Example:

    This is what "df (-h)" reports:

    testhost2 ~ $ df
    Filesystem 1K-blocks Used Available Use% Mounted on
    /dev/sda2 254941748 3985516 237999204 2% /
    tmpfs 16414592 0 16414592 0% /dev/shm
    /dev/sda1 495844 63149 407095 14% /boot

    testhost2 ~ $ df -h
    Filesystem Size Used Avail Use% Mounted on
    /dev/sda2 244G 3.9G 227G 2% /
    tmpfs 16G 0 16G 0% /dev/shm
    /dev/sda1 485M 62M 398M 14% /boot

    Here's what the check_mk_agent reports to the server:

    <<<df>>>
    /dev/sda2 ext4 254941748 3985200 237999520 2% /
    tmpfs tmpfs 16414592 0 16414592 0% /dev/shm
    /dev/sda1 ext3 495844 63149 407095 14% /boot
    <<<df>>>

    And this is what the Nagios server reports on the web site and in the
    pnp4nagios graphs:

    fs_/ OK 6.6% used (*16.16* of 243.13 GB)
    fs_/boot OK 17.9% used (*86.67* of 484.22 MB)

    This is more than 400%(!) off for fs_/ and a good 25% off for fs_/boot.

    The offset seems to be random but significant on every single host we
    have in our cluster.

    Is anybody else seeing this? Is there anything we need to change in our
    configurations to have CheckMK report the right values on the web site
    and in the rrd graphs?

    Thanks!

    _______________________________________________
    checkmk-en mailing list
    checkmk-en@lists.mathias-kettner.de
    <mailto:checkmk-en@lists.mathias-kettner.de>
    http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en

Brain_Slug · March 8, 2016, 9:35pm

o.k., just for the heck of it I installed OMD / RAW Edition to see if
this was any different than the source install. I couldn't find
1.2.6.p15 as a download option any longer, so I had to use 1.2.6p16
instead. However, exactly the same problem.

Fresh OS installation, installed RAW 1.2.6p15 rpm, added single host via
WATO, get same wrong filesystem readings of CMK web page. Here the
readings for testhost1:

df / Filesystem / 6.0% used (14.70 of 243.13 GB)
df /boot Filesystem /boot 17.9% used (86.69 of 484.22 MB)

So the problem does not seem to be related to Source install vs. OMD
rpm. And again, the agent on the client is reporting correct values to
the server.

Any pointers would be much appreciated, I'd hate to have to downgrade to
1.2.2p2 again.

Cheers!

···

On 03/08/2016 03:22 PM, Brainslug wrote:

Hi Marcel,

  thanks for the suggestions.

I stood up a brand new test server today. We don't use OMD but install
CMK via "setup.sh" from the sources. Nagios 3.5.0 packages are from EPEL
repositories, otherwise a "naked" Centos 6.3 server.

The only changes I made after the fresh install is add a file
"all_hosts.mk" to /opt/check_mk/etc/conf.d that has two host definitions
in it:

all_hosts += [

"testhost1",
"testhost2",

]

Ran an inventory with -II and started via -O.

Same problem - all my file system reports are way off:

testhost1:

testhost1 conf.d $ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 244G 2.4G 229G 2% /
tmpfs 16G 0 16G 0% /dev/shm
/dev/sda1 485M 62M 398M 14% /boot

WebGui reports:

fs_/ OK 6.0% used (14.70 of 243.13 GB)
fs_/boot OK 17.9% used (86.69 of 484.22 MB)

For testhost2:

testhost2 ~ $ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 244G 3.9G 227G 2% /
tmpfs 16G 0 16G 0% /dev/shm
/dev/sda1 485M 62M 398M 14% /boot

and on the GUI:

fs_/ OK 6.6% used (16.17 of 243.13 GB)
fs_/boot OK 17.9% used (86.67 of 484.22 MB)

I also checked both GUIs, Nagios and CMK's web page, both are reporting
the same values for both hosts, both are wrong.
The Check_Mk agent on the hosts is reporting correct numbers to the
server, though.

Any other ideas as to what might be going wrong here? A fresh install
using 1.2.2p2 works without issues and reports proper file system
numbers. I'm puzzled.

Thanks!

On 03/08/2016 01:19 AM, Marcel Schulte wrote:

Hi,

no problems like that here.

I'd install a fresh OMD testsite without any additional configuration,
add one of the hosts in question and check if the wrong filesystem sizes
show up. If they don't the problem lies somewheer in your configuration
(customized df check or df.include file, custom rules, whatever).

BTW: did you try to reinventorize ("TabulaRasa" in WATO or "cmk -II" in
terminal) one of the hosts, rstart and check again?

Regards,
Marcel

Brainslug <brainslug@freakmail.de <mailto:brainslug@freakmail.de>>
schrieb am Mo., 7. März 2016 um 20:51 Uhr:

    Hi all,

            after upgrading CheckMK from 1.2.2p2 to 1.2.6p15 we've
    noticed that
    disk usage reporting seems to be way off.

    Example:

    This is what "df (-h)" reports:

    testhost2 ~ $ df
    Filesystem 1K-blocks Used Available Use% Mounted on
    /dev/sda2 254941748 3985516 237999204 2% /
    tmpfs 16414592 0 16414592 0% /dev/shm
    /dev/sda1 495844 63149 407095 14% /boot

    testhost2 ~ $ df -h
    Filesystem Size Used Avail Use% Mounted on
    /dev/sda2 244G 3.9G 227G 2% /
    tmpfs 16G 0 16G 0% /dev/shm
    /dev/sda1 485M 62M 398M 14% /boot

    Here's what the check_mk_agent reports to the server:

    <<<df>>>
    /dev/sda2 ext4 254941748 3985200 237999520 2% /
    tmpfs tmpfs 16414592 0 16414592 0% /dev/shm
    /dev/sda1 ext3 495844 63149 407095 14% /boot
    <<<df>>>

    And this is what the Nagios server reports on the web site and in the
    pnp4nagios graphs:

    fs_/ OK 6.6% used (*16.16* of 243.13 GB)
    fs_/boot OK 17.9% used (*86.67* of 484.22 MB)

    This is more than 400%(!) off for fs_/ and a good 25% off for fs_/boot.

    The offset seems to be random but significant on every single host we
    have in our cluster.

    Is anybody else seeing this? Is there anything we need to change in our
    configurations to have CheckMK report the right values on the web site
    and in the rrd graphs?

    Thanks!

    _______________________________________________
    checkmk-en mailing list
    checkmk-en@lists.mathias-kettner.de
    <mailto:checkmk-en@lists.mathias-kettner.de>
    http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en

_______________________________________________
checkmk-en mailing list
checkmk-en@lists.mathias-kettner.de
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en

Jam_Mulch · March 8, 2016, 10:24pm

Looking at the code…
/omd/sites//share/check_mk/checks/df

            # Beware: the 6th column of df ("used perc") may

includes 5% which are reserved
# for the superuser, whereas the 4th colum (“used MB”)
does not include that.
# Beware(2): the column used_mb does not account for the
reserved space for
# superusers. So we rather use
the column ‘avail’ and subtract that from total
# to compute the used space.

Using just the 1K-blocks and Available columns in your df output, I

get:

(1 - (237999204/254941748)) * 100 =   6.64565% used
(1 - (407095/495844)) * 100             = 17.89857% used

Check the computation using total and available vs using used and

available and
you will see they are not consistent…probably due to the reserved
space mentioned
in the comments above not being included in all columns.

IMHO, the way Check_MK computes space usage seems more useful
from a monitoring standpoint.

···

On 03/08/2016 04:35 PM, Brainslug
wrote:


o.k., just for the heck of it I installed OMD / RAW Edition to see if
this was any different than the source install. I couldn't find
1.2.6.p15 as a download option any longer, so I had to use 1.2.6p16
instead. However, exactly the same problem.
Fresh OS installation, installed RAW 1.2.6p15 rpm, added single host via
WATO, get same wrong filesystem readings of CMK web page. Here the
readings for testhost1:
df / Filesystem / 6.0% used (14.70 of 243.13 GB)
df /boot Filesystem /boot 17.9% used (86.69 of 484.22 MB)
So the problem does not seem to be related to Source install vs. OMD
rpm. And again, the agent on the client is reporting correct values to
the server.
Any pointers would be much appreciated, I'd hate to have to downgrade to
1.2.2p2 again.
Cheers!
On 03/08/2016 03:22 PM, Brainslug wrote:


	Hi Marcel,
thanks for the suggestions.
I stood up a brand new test server today. We don't use OMD but install
CMK via "setup.sh" from the sources. Nagios 3.5.0 packages are from EPEL
repositories, otherwise a "naked" Centos 6.3 server.
The only changes I made after the fresh install is add a file
"all_hosts.mk" to /opt/check_mk/etc/conf.d that has two host definitions
in it:
all_hosts += [
"testhost1",
"testhost2",
]
Ran an inventory with -II and started via -O.
Same problem - all my file system reports are way off:
testhost1:
testhost1 conf.d $ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 244G 2.4G 229G 2% /
tmpfs 16G 0 16G 0% /dev/shm
/dev/sda1 485M 62M 398M 14% /boot
WebGui reports:
fs_/ OK 6.0% used (14.70 of 243.13 GB)
fs_/boot OK 17.9% used (86.69 of 484.22 MB)
For testhost2:
testhost2 ~ $ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 244G 3.9G 227G 2% /
tmpfs 16G 0 16G 0% /dev/shm
/dev/sda1 485M 62M 398M 14% /boot
and on the GUI:
fs_/ OK 6.6% used (16.17 of 243.13 GB)
fs_/boot OK 17.9% used (86.67 of 484.22 MB)
I also checked both GUIs, Nagios and CMK's web page, both are reporting
the same values for both hosts, both are wrong.
The Check_Mk agent on the hosts is reporting correct numbers to the
server, though.
Any other ideas as to what might be going wrong here? A fresh install
using 1.2.2p2 works without issues and reports proper file system
numbers. I'm puzzled.
Thanks!
On 03/08/2016 01:19 AM, Marcel Schulte wrote:


Hi,
no problems like that here.
I'd install a fresh OMD testsite without any additional configuration,
add one of the hosts in question and check if the wrong filesystem sizes
show up. If they don't the problem lies somewheer in your configuration
(customized df check or df.include file, custom rules, whatever).
BTW: did you try to reinventorize ("TabulaRasa" in WATO or "cmk -II" in
terminal) one of the hosts, rstart and check again?
Regards,
Marcel
Brainslug < >
schrieb am Mo., 7. März 2016 um 20:51 Uhr:
Hi all,
after upgrading CheckMK from 1.2.2p2 to 1.2.6p15 we've
noticed that
disk usage reporting seems to be way off.
Example:
This is what "df (-h)" reports:
testhost2 ~ $ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 254941748 3985516 237999204 2% /
tmpfs 16414592 0 16414592 0% /dev/shm
/dev/sda1 495844 63149 407095 14% /boot
testhost2 ~ $ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 244G 3.9G 227G 2% /
tmpfs 16G 0 16G 0% /dev/shm
/dev/sda1 485M 62M 398M 14% /boot
Here's what the check_mk_agent reports to the server:
<<<df>>>
/dev/sda2 ext4 254941748 3985200 237999520 2% /
tmpfs tmpfs 16414592 0 16414592 0% /dev/shm
/dev/sda1 ext3 495844 63149 407095 14% /boot
<<<df>>>
And this is what the Nagios server reports on the web site and in the
pnp4nagios graphs:
fs_/ OK 6.6% used (*16.16* of 243.13 GB)
fs_/boot OK 17.9% used (*86.67* of 484.22 MB)
This is more than 400%(!) off for fs_/ and a good 25% off for fs_/boot.
The offset seems to be random but significant on every single host we
have in our cluster.
Is anybody else seeing this? Is there anything we need to change in our
configurations to have CheckMK report the right values on the web site
and in the rrd graphs?
Thanks!
_______________________________________________
checkmk-en mailing list

_______________________________________________
checkmk-en mailing list

_______________________________________________
checkmk-en mailing list

brainslug@freakmail.de mailto:brainslug@freakmail.de checkmk-en@lists.mathias-kettner.de mailto:checkmk-en@lists.mathias-kettner.de http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en checkmk-en@lists.mathias-kettner.de http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en checkmk-en@lists.mathias-kettner.de http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en

Brain_Slug · March 11, 2016, 9:02pm

Thanks Jam!

This makes sense and explains why the offset is so much bigger on an
almost empty disks and less pronounced on a disk that is fairly full.

I can see that this is now also described on the "df" Check manual page

https://mathias-kettner.de/checkmk_check_df.html

which was updated 03/11.

Cheers!