High CPU usage on server after upgrade

Hello!

I upgrade my Ubuntu server to 20.04 and then upgrade CheckMK to version 2.0.0p9 (RAW). Upgrade went perfectly, update all my agent super cool!

Since then, my server shows 60-70% CPU usage (from HyperV, it says vm around 30%).

I’ve been searching for issue and I found that perfdata sems to be the culprit. After digging a little bit, I saw that some graph are not working anymore.

Memory and paging: Cannot get RRD data, pagefile/mem_used_percent, cannotcreate graph. Some other graph, like CPU usage, have no data since the upgrade.

Anyone had this issue?

For Info:
VM got 4 CPU, 6gb of ram, and runs on SSD. CPU is a Ryzen 3600x.
Total host: 23
Total services: 111

Thanks

With your settings for the resources it should not be a problem to write down all the graphing data.
I would take a look at the log files for the rrdcached and for performance data processing of PNP4Nagios.

~/var/log/rrdcached.log
~/var/pnp4nagios/log/npcd.log

1 Like

Hi and thanks for your reply.

here’s what npcd.log show.

~/var/log/rrdcached.log shows a blank log

If you look at the spool folder from the error message do you see many files there?

there’s like 60k files in that folder :open_mouth:

I would stop the site.
Try one of the process perfdata commands to find some information about the problem.
Cleanup the spool folder. Attention the data there will be lost then.
Cleanup also the “~/var/rrdcached/” folder.
Start the site and investigate the logs and spool folder.

1 Like

cleaned the folders, and still have these logs in ncpd:

[09-16-2021 11:20:28] NPCD: ERROR: Command line was ‘/omd/sites/b2/lib/pnp4nagios/process_perfdata.pl -n -c /omd/sites/b2/etc/pnp4nagios/process_perfdata.cfg -b /omd/sites/b2/var/pnp4nagios/spool//perfdata.1631805597’
[09-16-2021 11:20:28] NPCD: ERROR: Executed command exits with return code ‘1’
[09-16-2021 11:20:28] NPCD: ERROR: Command line was ‘/omd/sites/b2/lib/pnp4nagios/process_perfdata.pl -n -c /omd/sites/b2/etc/pnp4nagios/process_perfdata.cfg -b /omd/sites/b2/var/pnp4nagios/spool//perfdata.1631805567’
[09-16-2021 11:20:28] NPCD: ERROR: Executed command exits with return code ‘1’
[09-16-2021 11:20:28] NPCD: ERROR: Command line was ‘/omd/sites/b2/lib/pnp4nagios/process_perfdata.pl -n -c /omd/sites/b2/etc/pnp4nagios/process_perfdata.cfg -b /omd/sites/b2/var/pnp4nagios/spool//perfdata.1631805582’
[09-16-2021 11:20:28] NPCD: ERROR: Executed command exits with return code ‘1’
[09-16-2021 11:20:28] NPCD: ERROR: Command line was ‘/omd/sites/b2/lib/pnp4nagios/process_perfdata.pl -n -c /omd/sites/b2/etc/pnp4nagios/process_perfdata.cfg -b /omd/sites/b2/var/pnp4nagios/spool//perfdata.1631805627’
[09-16-2021 11:20:28] NPCD: ERROR: Executed command exits with return code ‘1’
[09-16-2021 11:20:28] NPCD: ERROR: Command line was ‘/omd/sites/b2/lib/pnp4nagios/process_perfdata.pl -n -c /omd/sites/b2/etc/pnp4nagios/process_perfdata.cfg -b /omd/sites/b2/var/pnp4nagios/spool//perfdata.1631805612’

some graph are now present, but still no data. CPU usage dropped now .

Can you execute such a command manually to check for some more error message?

sure, here’s the output:

sudo ./process_perfdata.pl -n -c /omd/sites/b2/etc/pnp4nagios/process_perfdata.cfg -b /omd/sites/b2/var/pnp4nagios/spool//perfdata.1631805627
HiRes.c: loadable library and perl binaries are mismatched (got handshake key 0xde00080, needed 0xcd00080)

I guess I can try to delete a node and recreate it to see if that fix the issue (which wouldn’t be a problem, since I don’t have 200 host haha)

I think i know what the problem is. Is there some other perl version installed then the distribution one from Ubuntu 20.04?
I think your CMK package is the 20.04 package :slight_smile:
It is a mismatch between the expected perl version from inside CMK und the version in the system.
Or there went something wrong with the distribution upgrade.
What should help as a last resort is - backup your site, install a clean Ubuntu 20.04, install CMK package, restore the site backup.

1 Like

Yes. I did a v18 to v20 upgrade, then installed 2.0.0p8 cmk.

After. i deleted old version or cmk.

Thats a thing I can do, backup then restore on a new fresh vm.

Well, I downloaded the lastest version, I couldn’t restore backup on it, so I updated my 2.0.0p9 to 2.0.0p11 stable, it fixes everything :open_mouth:

1 Like

Could be related to

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.