NFS IO Stat PROBLEM

Hi!
I have the following problem in CheckMk 1.6.0p19. By putting the nfsiostat script in the agent path on the CentOS 7.X machine there seems to be some problems:

  1. infinitely repeated values

ex.

  1. values expressed in seconds and not in milliseconds

ex.

is this correct? is there any way to fix it?

Thx.
Dario.

Hi,
did you check the parameters for this service? Whats about the parameters for this service in service discovery? Looks like, that default parameter matching.

Cheers,
Christian

Hi @ChristianM,

Checkmk side no changes have been made. If I run the command

nfsiostat | paste -sd ' ' - | tr -s ' '
nas-nfs-b.isilon:/hosttest mounted on /var/www: op/s rpc bklog 67.23 0.00 read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 8.763 277.908 31.715 0 (0.0%) 15.728 15.895 write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) 0.139 20.428 147.294 0 (0.0%) 3.398 20.956

checkmk detects

Incorrect conversion?

Thx.
Dario.

I had a look in the code. A problem will only raised when a parameter is set. Please check parameter of this service first.

Cheers,
Christian

Hi @ChristianM,

There are no parameters set.

Another strange thing is that it seems that the values that it extracts are always the same as if it were keeping the data in cache, is this possible?

When checking the isilon the values go up and down and the measuring scale is totally different.

15.7 seconds is objectively too long as the unit of measurement should be ms.

Thx.
Dario.

Hi Dario,

yes 15.7 seconds is to long, but if you look at the graph, you have a huge average. Whats about between 8:30 and 9:00, there should be ms.

Cheers,
Christian

Hi @ChristianM,

If the command returns 15.728 (ms) why does the checkmk graph return 15.7 (s) ?

image

Another thing is that when we put the default script in place, the values it extracts are always the same.


Thx.
Dario.

I’ve raised this before. The plugin does not take into account that the first result returned by nfsiostat always returns average stats for the FS since it has been mounted. To work properly, i.e. monitor the current stats of the mount, the agent needs to ignore the first result and instead return the next result.

Per nfsiostat man page:

   <interval>
          specifies  the  amount  of time in seconds between each report.  The first report contains statistics for the time since each file
          system was mounted.  Each subsequent report contains statistics collected during the interval since the previous report.

A simple fix might be:
nfsiostat 5 2 | sed 1,10d | paste -sd " " - | tr -s ' '
This returns 2 results from iostat with a 5-second gap to generate a small average. The first 10 lines (= the first set of stats) are then removed by sed.

Update:
It’s not a simple fix. The above doesn’t take into account multiple NFS mounts. It also doesn’t take into account the fact that nfsiostat has changed output format over the years :frowning:

1 Like

Hi @pn-rallen,

maybe it helps.

#!/bin/bash

list_nfs=$(df -h | grep "NFS" | awk '{print $6}')

#echo $list_nfs

if command nfsiostat > /dev/null ; then
    echo '<<<nfsiostat>>>'
    for nfs_name in $list_nfs
    do
     nfsiostat 1 2 $nfs_name | sed 1,10d | paste -sd " " - | tr -s ' '
    done
fi

Dario.

1 Like

You probably want to mark program code as “preformatted text” to prevent the forum software from mangling it (it swallows the inner ‘<…>’ tags for example).

discourse-preformatted-text

1 Like

Hi @martin.schwarz, @pn-rallen

Thx.

#!/bin/bash

#list_nfs=$(df -h | grep "nas-nfs" | awk '{print $6}')
list_nfs=$(nfsstat -m | awk '{print $1}' | grep -e "/")

if command nfsiostat > /dev/null ; then
    echo '<<<nfsiostat>>>'
    for nfs_name in $list_nfs
    do
      echo -ne $(nfsiostat 1 2 $nfs_name | sed 1,10d | paste -sd " " - | tr -s ' ') " "
    done
fi

Dario.

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.