Omd restore should rely on serverla CPUs

I suppose that “omd restore site.tar.gz” is just relying on one cpu. In my case the restore lasts more than one hour. I suggest to use at least half of the CPUs to restore.

Hi @DonMarcocello, welcome to the checkmk forum.

Your suggestion is good but with a kinda dead python plugin for an enterprise software you can’t win a teapot. The restore process isn’t needed that often so it’s not the primary point I would start to speed up checkmk.
If you use the backup and restore functionality to build a test site you can try to let out the historical data (RRD and/or logs) via the command line options -N | --no-past for both or --no-rrd and --no-logs for either. You can also create the tarball without the compression via --no-compression which will speed up the process massively for the cost of storage.

2 Likes

Thank you for you suggestion. After having used the no-compress option (didn’t think of it before) my assumption that gzip is the root cause has not been confirmed. Thank you for your help.
It seems as if I have some stuff in my installation that slows down the whole backup/restore.

Thinking through again, parallelize the gzip process for the backup/restore function could have a huge impact on your monitoring. If it use up all or a great amount of CPUs of your monitoring server you end up with a slowed down monitoring which is mostly not what you like to have.
If you really like to use a parallelized gzip you can pipe it through like omd backup <site> - | gzip > output.tar.gz. Haven’t tried this but in theory you could use this also across servers and use the direct call of gzip on both machines:
omd backup <site> - | gzip | ssh user@host 'cat - | gzip -d | omd restore -'

Good point
After some research I found that ~/var/check_mk/site_configs is the root cause of my problems. The mentioned directory contains more than 6 mio files. (As I opened another issue Ever growing directories let’s close this one)…
Thank you for your help.