Hi,
after the upgrade to check-mk-enterprise p11, rrdcached
is not running on two slaves, so currently I don’t have any graphs, we have a distributed monitoring with 8 slaves and a master. when I try to restart rrdcached
manually I get the following:
~$ omd start rrdcached
Temporary filesystem already mounted
~$ omd restart rrdcached
Temporary filesystem already mounted
In ~/tmp
there is no rrdcached.pid
~$ ll tmp/
total 0
drwxr-xr-x 4 site site 100 Oct 7 11:31 apache/
drwxr-xr-x 10 site site 220 Oct 6 11:24 check_mk/
-rw-r--r-- 1 site site 0 Oct 4 11:23 initialized
drwxr-xr-x 3 site site 60 Oct 4 11:23 liveproxyd/
drwxr-xr-x 2 site site 40 Oct 4 11:23 lock/
drwxr-xr-x 4 site site 80 Oct 4 11:23 nagios/
drwxrwxr-x 4 site site 80 Oct 4 11:23 nagvis/
drwxr-xr-x 5 site site 100 Oct 4 11:23 php/
drwxr-xr-x 5 site site 100 Oct 4 11:23 pnp4nagios/
drwxr-xr-x 2 site site 40 Oct 4 11:23 rrdcached/
drwxr-xr-x 4 site site 320 Oct 7 11:52 run/
On all check_mk hosts I get this:
~$omd status rrdcached
-----------------------
Overall state: unused
And there is no rrdcached
at all
~$ omd status
mkeventd: running
liveproxyd: running
mknotifyd: running
cmc: running
apache: running
dcd: running
redis: running
stunnel: running
xinetd: running
crontab: running
-----------------------
Overall state: running
Any idea how I can start it again?
Thanks
CFriedrich
(Christian Friedrich)
October 7, 2021, 12:15pm
2
Hi,
do you find any messages about the rrdcached service in the logfiles?
regards
Christian
Hi christian,
I see a lot of:
var/log/cmc.log.1:2021-10-06 00:00:48 [4] [rrdcached thread] [rrdcached at "/omd/sites/site/tmp/run/rrdcached.sock"] cannot connect: No such file or directory
var/log/cmc.log:2021-10-07 12:16:43 [4] [client 5] Error flushing RRD: Unable to connect to rrdcached: No such file or directory
var/pnp4nagios/log/perfdata.log:2021-03-25 16:01:31 [2980802] [0] RRDs::update ERROR rrdcached@unix:/omd/sites/site/tmp/run/rrdcached.sock: illegal attempt to update using time 1616688084.000000 when last update time is 1616688084.000000 (minimum one second step)
it already started on the 04.10:
2021-10-04 11:22:39 [5] [rrdcached thread] started
2021-10-04 11:22:39 [5] [core 1046] -----------------------------------------------------------------
2021-10-04 11:22:39 [5] [core 1046] Check_MK Micro Core started with PID 1046
2021-10-04 11:22:39 [5] [core 1046] version 2.0.0p11 compiled Thu, 16 Sep 2021 12:17:02 +0000 on debian-10
2021-10-04 11:22:39 [5] [core 1046] built with g++-10 (GCC) 10.1.0, using RE2 regex engine
2021-10-04 11:22:39 [5] [core 1046] loaded configuration 408 (0xea8390) from 2021-10-04 11:22:39 with 80 hosts and 1861 services in 9.08754 ms
2021-10-04 11:22:39 [5] [core 1046] loaded saved program state with 80 hosts, 1861 services, 0 comments, and 5 downtimes in 7.31639 ms
2021-10-04 11:22:39 [5] [main] [livestatus manager] starting
2021-10-04 11:22:39 [5] [main] [livestatus manager] listening on /omd/sites/site/tmp/run/live
2021-10-04 11:22:39 [5] [main] [livestatus manager] created 20 Livestatus threads with stack size 4194304 in 1.39756 ms
2021-10-04 11:22:39 [5] [core 1046] [livestatus local] Successfully created new command pipe at "/omd/sites/site/tmp/run/nagios.cmd".
2021-10-04 11:22:39 [5] [core 1046] [livestatus local] Successfully opened command pipe at "/omd/sites/site/tmp/run/nagios.cmd".
2021-10-04 11:22:39 [5] [main] [RRD helper 1070] started, commandline: /omd/sites/site/bin/cmk --create-rrd --keepalive
2021-10-04 11:22:39 [5] [carbon thread] [carbon connection pool] started
2021-10-04 11:22:39 [5] [core 1046] building state history cache for the time period from 2019-10-05 11:22:39 to 2021-10-04 11:22:39 (730 days)
2021-10-04 11:22:39 [5] [alert helper 1072] started, commandline: /omd/sites/site/bin/cmk --handle-alerts --keepalive
2021-10-04 11:22:39 [5] [generic pool] [helper 1073] started, commandline: /omd/sites/site/lib/cmc/checkhelper
2021-10-04 11:22:39 [5] [generic pool] [helper 1074] started, commandline: /omd/sites/site/lib/cmc/checkhelper
2021-10-04 11:22:39 [5] [generic pool] [helper 1075] started, commandline: /omd/sites/site/lib/cmc/checkhelper
2021-10-04 11:22:39 [5] [generic pool] [helper 1076] started, commandline: /omd/sites/site/lib/cmc/checkhelper
2021-10-04 11:22:39 [5] [generic pool] [helper 1077] started, commandline: /omd/sites/site/lib/cmc/checkhelper
2021-10-04 11:22:39 [5] [generic pool] started 5 helpers in 5.71226 ms
2021-10-04 11:22:39 [5] [checker pool] [helper 1078] started, commandline: /omd/sites/site/bin/cmk --checker
2021-10-04 11:22:39 [5] [checker pool] [helper 1079] started, commandline: /omd/sites/site/bin/cmk --checker
2021-10-04 11:22:39 [5] [checker pool] [helper 1080] started, commandline: /omd/sites/site/bin/cmk --checker
2021-10-04 11:22:39 [5] [checker pool] [helper 1081] started, commandline: /omd/sites/site/bin/cmk --checker
2021-10-04 11:22:39 [5] [checker pool] started 4 helpers in 7.5122 ms
2021-10-04 11:22:39 [5] [real-time pool] [helper 1082] started, commandline: /omd/sites/site/bin/cmk --keepalive --real-time-checks
2021-10-04 11:22:39 [5] [real-time pool] started 1 helper in 1.49708 ms
2021-10-04 11:22:39 [5] [fetcher pool] [helper 1083] started, commandline: /omd/sites/site/bin/fetcher
2021-10-04 11:22:39 [5] [fetcher pool] [helper 1084] started, commandline: /omd/sites/site/bin/fetcher
2021-10-04 11:22:39 [5] [fetcher pool] [helper 1085] started, commandline: /omd/sites/site/bin/fetcher
2021-10-04 11:22:39 [5] [fetcher pool] [helper 1086] started, commandline: /omd/sites/site/bin/fetcher
2021-10-04 11:22:39 [5] [fetcher pool] [helper 1087] started, commandline: /omd/sites/site/bin/fetcher
2021-10-04 11:22:39 [5] [fetcher pool] [helper 1088] started, commandline: /omd/sites/site/bin/fetcher
2021-10-04 11:22:39 [5] [core 1046] finalized 1941 history caches in 1.61908 ms
2021-10-04 11:22:39 [5] [core 1046] ,-Cache for statehist------------------------------------------------------------.
2021-10-04 11:22:39 [5] [core 1046] | |
2021-10-04 11:22:39 [5] [core 1046] | parsed speed cached |
2021-10-04 11:22:39 [5] [core 1046] | ----------- ------------- ---------------------- |
2021-10-04 11:22:39 [5] [core 1046] | 1 Logfiles 17.36 Logfiles/s 1941 hosts/services |
2021-10-04 11:22:39 [5] [core 1046] | 0.001 GB of data 12.141 MB/s 3896 host/service events |
2021-10-04 11:22:39 [5] [core 1046] | 0.006 Mio messages 0.104 Mio messages/s 1 core starts/stops |
2021-10-04 11:22:39 [5] [core 1046] | 0.2 days of history 2.01 entries per host/serv. |
2021-10-04 11:22:39 [5] [core 1046] | 3895.00 entries per day |
2021-10-04 11:22:39 [5] [core 1046] | 7842 strings |
2021-10-04 11:22:39 [5] [core 1046] | 2192 unique strings (28.0%) |
2021-10-04 11:22:39 [5] [core 1046] | 00:00 parsing time |
2021-10-04 11:22:39 [5] [core 1046] '--------------------------------------------------------------------------------'
2021-10-04 11:22:39 [5] [fetcher pool] [helper 1089] started, commandline: /omd/sites/site/bin/fetcher
2021-10-04 11:22:39 [5] [fetcher pool] [helper 1090] started, commandline: /omd/sites/site/bin/fetcher
2021-10-04 11:22:39 [5] [fetcher pool] [helper 1091] started, commandline: /omd/sites/site/bin/fetcher
2021-10-04 11:22:39 [5] [fetcher pool] [helper 1092] started, commandline: /omd/sites/site/bin/fetcher
2021-10-04 11:22:39 [5] [fetcher pool] [helper 1093] started, commandline: /omd/sites/site/bin/fetcher
2021-10-04 11:22:39 [5] [fetcher pool] [helper 1094] started, commandline: /omd/sites/site/bin/fetcher
2021-10-04 11:22:39 [5] [fetcher pool] [helper 1095] started, commandline: /omd/sites/site/bin/fetcher
2021-10-04 11:22:39 [5] [fetcher pool] started 13 helpers in 89.3714 ms
2021-10-04 11:22:39 [5] [notification helper 1096] started, commandline: /omd/sites/site/bin/cmk --notify --keepalive
2021-10-04 11:22:39 [5] [icmpsender 1097] started, commandline: /omd/sites/site/lib/cmc/icmpsender 8 0 1000
2021-10-04 11:22:39 [5] [icmpreceiver 1098] started, commandline: /omd/sites/site/lib/cmc/icmpreceiver
2021-10-04 11:23:32 [4] [rrdcached thread] [rrdcached at "/omd/sites/site/tmp/run/rrdcached.sock"] cannot connect: No such file or directory
2021-10-04 11:23:54 [4] [rrdcached thread] [rrdcached at "/omd/sites/site/tmp/run/rrdcached.sock"] cannot connect: No such file or directory
2021-10-04 11:24:14 [4] [rrdcached thread] [rrdcached at "/omd/sites/site/tmp/run/rrdcached.sock"] cannot connect: No such file or directory
2021-10-04 11:24:38 [4] [rrdcached thread] [rrdcached at "/omd/sites/site/tmp/run/rrdcached.sock"] cannot connect: No such file or directory
2021-10-04 11:25:11 [4] [rrdcached thread] [rrdcached at "/omd/sites/site/tmp/run/rrdcached.sock"] cannot connect: No such file or directory
2021-10-04 11:25:35 [4] [rrdcached thread] [rrdcached at "/omd/sites/site/tmp/run/rrdcached.sock"] cannot connect: No such file or directory
2021-10-04 11:25:51 [4] [rrdcached thread] [rrdcached at "/omd/sites/site/tmp/run/rrdcached.sock"] cannot connect: No such file or directory
2021-10-04 11:26:07 [4] [rrdcached thread] [rrdcached at "/omd/sites/site/tmp/run/rrdcached.sock"] cannot connect: No such file or directory
2021-10-04 11:26:49 [4] [rrdcached thread] [rrdcached at "/omd/sites/site/tmp/run/rrdcached.sock"] cannot connect: No such file or directory
2021-10-04 11:27:37 [4] [rrdcached thread] [rrdcached at "/omd/sites/site/tmp/run/rrdcached.sock"] cannot connect: No such file or directory
2021-10-04 11:27:49 [4] [client 3] error: client connection terminated: timeout
CFriedrich
(Christian Friedrich)
October 7, 2021, 1:09pm
5
Maybe a stupid question, but check if a file system is full?
If that’s not the case, maybe someone else can help. Unfortunately, I am not yet such an expert
more than enough disk space. Thanks Christian
1 Like
Can you restart only the rrdcached?
Before i test this i would cleanup the old cache/spool files with RRD data.
I get the above, but on all check-mk hosts, not only on those where the daemon is not running. and the daemon doesn’t even show in the omd status
out put, as posted above.
Which files exactly should I clean up?
I just created a new site in a fresh linux installation, rrdcached is present. So I assume it has to do with upgrade from the RAW to the Enterprise edition. Because as I said rrdcached is not running in none of the current check-mk hosts, but only on 4 of them there is no rrdcached socket, which is pretty strange!
rrdcached must run in RAW and Enterprise. This is needed to store the performance data for all versions of CMK.
On command line i would inspect the “~/etc/init.d/” folder. Is there also no rrdcached file?
If it is there you need to check the file “~/etc/omd/site.conf”.
The entry “CONFIG_PNP4NAGIOS=‘on’” should exist.
That’s all what i would check.
“CONFIG_PNP4NAGIOS=‘on’”
did it. Thanks Andreas
Hi Andreas, I still have one issue with only one slave. There are no Graphs at all. rrdcached
is present and running. Journals in ~/var/rrcached/
are created but they are empty. After I converted the rrds, they are present in ~/var/check_mk/rrd/
. CONFIG_PNP4NAGIOS=‘on'
is correctly set.
~$ cat ~/etc/omd/site.conf
# Managed by Puppet. DO NOT EDIT!
#
CONFIG_ADMIN_MAIL=''
CONFIG_APACHE_MODE='own'
CONFIG_APACHE_TCP_ADDR='127.0.0.1'
CONFIG_APACHE_TCP_PORT='5000'
CONFIG_AUTOSTART='on'
CONFIG_CORE='cmc'
CONFIG_DOKUWIKI_AUTH='off'
CONFIG_LIVEPROXYD='on'
CONFIG_LIVESTATUS_TCP='on'
CONFIG_LIVESTATUS_TCP_ONLY_FROM='192.168.1.61'
CONFIG_LIVESTATUS_TCP_PORT='6557'
CONFIG_LIVESTATUS_TCP_TLS='on'
CONFIG_MKEVENTD='on'
CONFIG_MKEVENTD_SNMPTRAP='off'
CONFIG_MKEVENTD_SYSLOG='off'
CONFIG_MKEVENTD_SYSLOG_TCP='off'
CONFIG_MULTISITE_AUTHORISATION='on'
CONFIG_MULTISITE_COOKIE_AUTH='on'
CONFIG_NAGIOS_THEME='dark'
CONFIG_NSCA='off'
CONFIG_NSCA_TCP_PORT='5667'
CONFIG_PNP4NAGIOS='on'
CONFIG_TMPFS='on'
Further more, when adding a new host to this slave, rrds are not created, the host doesn’t have a folder in ~/var/check_mk/rrd/
.
The Graphs look like this on hosts of this specific slave:
ghassan
October 12, 2021, 7:31am
14
Unfortunately I couldn’t fix it and I don’t know what is the reason, but to get the site back I did the following:
create a backup without rrds -N
option
create a new site
restore the backup to the new site
reinventory all hosts in the new site
stop and rename original site
stop rename new site according to the original one
start the new/old site
system
(system)
Closed
October 12, 2022, 7:32am
15
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.