I currently have a lot of missing rrd files. I am also looking to run convert-rrd to upgrade the existing files (which apparently can also cause the issue). We have had issues in the past with conversion creating the RRD fils but omitting some of the datasources when the new file. In order to manage the problem, I want to script the validation of the RRD data. I will have a list of hosts and services (and potentially a list of the expected RRD files) to drive this.
While accessing a bad datasource via the web UI will produce an entry in the cmc.log, this is far too manual for the scale we will be operating on. Unfortunately the state of the underlying RRD data reported in the Web UI is embedded in json which is embedded in javascript, which is embedded in HTML, which is embedded in JSON returned via a URL which requires an undocumented (and somewhat cryptic set of parameters suppled in the body of the request to retrieve the data. Hence no a good starting point. Even if this were documented and transparent, the endpoint of protected via CSRF.
While this KB article seemed to address my requirement exactly, there is not enough information provided there for me to implement a solution. It apppears that the suggestion of using LiveQuery from the command line has the least number of unexplained parameters.
That page suggests that running:
lq "GET services\nFilter: host_name = mysite\nFilter: service_description = Filesystem /\nColumns: host_name\nColumns: service_description\nColumns: rrddata:m1:fs_used.max,1024,/:1614839543:1614929543:30\nOutputFormat: python"
would return some RRD data.
However the “Syntax” seems to confuse the structure of the columns attribute with specific examples. It seems the pattern is 6 parameters seperated by colons: rrddata:var1:metric1.max:start:end:step
Comparing this with the provided command line:
rrddata → “rrdata” so presumably a literal value
var1 → “m1” - I have NO IDEA what this represents
metric1.max → “fs_used.max” - so this appears to be the datasource name along with the aggregation method, but is “max” appropriate for all types of data?
,1024 - No idea
,/ - presumably this refers to the root filesystem, and is embedded in the RRD filename
The remaining values make some sense.
How can get I translate the information in the RRD file path and/or associated XML to a livestatus query?