Restructure the nix agents

Hi,

Apologies in advance for the wall of text.

For those who haven’t seen my name yet, I’ve put in some contributions towards the improvement of the various *nix agent scripts, including writing a near-complete POSIX compatible merge of them all, which has included some significant overhauls and code improvements (see PR #28)

As I’ve been tracking commits to the existing scripts and merging them into the monolithic merged script, it has become increasingly clear to me that such an approach for the *nix agent is not scalable or manageable. I’ve actually had those misgivings from the very start.

I have had other issues with the older agents when deployed across different *nix variants, Linux distros or even packages within the same distro (e.g. the tribe29 rpm and EPEL rpm use different directories). And it’s all a bit stupid (IMHO), because paths like /usr/lib/check_mk_agent/local/, /usr/share/check-mk-agent/local, /usr/share/check-mk-agent/plugins and similar examples are non-obvious and, honestly, a little bit obnoxious.

So my proposal is to restructure the nix agents to use a modular approach based in /opt/checkmk/agent

While /opt seems to be Linux-centric, the latest version of the FHS states:

Rationale
The use of /opt for add-on software is a well-established practice in the UNIX community. TheSystem V Application Binary Interface [AT&T 1990], based on the System V Interface Definition(Third Edition), provides for an /opt structure very similar to the one defined here.The Intel Binary Compatibility Standard v. 2 (iBCS2) also provides a similar structure for /opt.

And, obviously, anybody can package it to reside elsewhere if they choose.

I propose splitting the massive merged monolithic script up into modules and simple libraries. Such a structure might look something like

/opt/checkmk/agent/bin/checkmk_agent    # agent script
/opt/checkmk/agent/lib/common.sh        # lib path, referencing a shell library of common functions
/opt/checkmk/agent/include/common.sh    # possible alternative to lib
/opt/checkmk/agent/local-available/     # path for available local checks
/opt/checkmk/agent/local-enabled/       # path for available local checks that will be run by the agent
/opt/checkmk/agent/plugins-available/   # as above, but for plugins
/opt/checkmk/agent/plugins-enabled/

And so on with other pieces of structure, along with symlinks (which can be managed via package scripts) where required e.g:

/opt/checkmk/agent/var/ --> /var/opt/checkmk/agent/
/opt/checkmk/agent/etc/ --> /etc/opt/checkmk/agent/
/opt/checkmk/agent/tmp/ --> /tmp/checkmk/agent/

The agent’s job is then greatly simplified - it’s invoked via xinetd or systemd or some other method, it attempts to find a sane interpreter, sets some environment variables and then loops through whatever is defined in local-enabled and plugins-enabled. Much of what is currently in the agent script can then be spun out to either local-available or plugins-available (or, alternatively, some other path like /opt/checkmk/agent/core-checks/)

Once this modular approach is implemented, it then becomes far easier to apply fixes and improvements in isolation from the rest of the agent code. For example, PR #116 should have only applied to a file like /opt/checkmk/agent/core-checks/timesync.sh.

The modular approach conveniently fixes the main outstanding issue in PR #28 i.e. POSIX cannot easily/readily export functions. In the modular layout, we can simply have those functions in the bin/ directory as standalone scripts.

The other thing that this modular approach enables is a potential migration towards (ND)JSON style output. The thought of trying to do that in the merged nix agent script just fills me with dread. With the modular approach, however…

▓▒░$ bash checkmk_agent | jq -r '.'
{
  "checkmk": {
    "Version": "TESTING",
    "AgentOS": "linux",
    "Hostname": "minty",
    "AgentDirectory": "/etc/check_mk",
    "DataDirectory": "/var/lib/check_mk_agent",
    "SpoolDirectory": "/var/lib/check_mk_agent/spool",
    "PluginsDirectory": "/usr/lib/check_mk_agent/plugins",
    "LocalDirectory": "/usr/lib/check_mk_agent/local"
  },
  "timestamp": {
    "utc_epoch": 1583230981
  }
}

I’m happy to do much of the heavy lifting on the agent side of the equation, if anyone is interested in making the requisite changes on the server side.

Any questions and/or feedback appreciated :slight_smile:

After a little more fun with NDJSON formatting:

▓▒░$ bash checkmk_agent    
{"checkmk": {"Version": "testing-json", "AgentOS": "linux", "Hostname": "minty", "AgentDirectory": "/home/rawiri/git/checkMK/agents/nix/etc", "DataDirectory": "/home/rawiri/git/checkMK/agents/nix/var", "SpoolDirectory": "/home/rawiri/git/checkMK/agents/nix/var/spool", "PluginsDirectory": "/home/rawiri/git/checkMK/agents/nix/plugins-enabled", "LocalDirectory": "/home/rawiri/git/checkMK/agents/nix/local-enabled"}, "timestamp": {"utc_epoch": 1583318477}}
<<<fileinfo:sep(124)>>>
1583318477
[[[header]]]
name|status|size|time
[[[content]]]
/tmp/validate_tld|ok|551|1583308175
/tmp/pants|missing
{"fileinfo": [{"name": "/tmp/validate_tld", "status": "ok", "size": 551, "time": 1583308175},{"name": "/tmp/pants", "status": "missing", "size": null, "time": null} ], "timestamp": {"utc_epoch": 1583318477}}
{"uptime": {"uptime": 1583021.05, "idle": 5110854.69, "who_b": "system boot  Feb 15 15:57"}}

So for the sake of comparison, I’ve kept the existing layout for the fileinfo check, and an example of how it might be represented in json format:

<<<fileinfo:sep(124)>>>
1583318477
[[[header]]]
name|status|size|time
[[[content]]]
/tmp/validate_tld|ok|551|1583308175
/tmp/pants|missing

vs

{"fileinfo": [{"name": "/tmp/validate_tld", "status": "ok", "size": 551, "time": 1583308175},{"name": "/tmp/pants", "status": "missing", "size": null, "time": null} ], "timestamp": {"utc_epoch": 1583318477}}

Or, when pretty printed:

{
  "fileinfo": [
    {
      "name": "/tmp/validate_tld",
      "status": "ok",
      "size": 551,
      "time": 1583308175
    },
    {
      "name": "/tmp/pants",
      "status": "missing",
      "size": null,
      "time": null
    }
  ],
  "timestamp": {
    "utc_epoch": 1583353110
  }
}

The structure of this is such that adding extra fields like mode, owner, group and checksum are dead simple, and this improves the capability of fileinfo, to the point that it fundamentally becomes a FIM.

Actually, this was so easy that I went ahead and did it:

{
  "fileinfo": [
    {
      "name": "/tmp/validate_tld",
      "status": "ok",
      "size": 551,
      "uid": 1000,
      "gid": 1000,
      "mode": 640,
      "atime": 1583358170,
      "mtime": 1583308175,
      "checksum": "a0d4c6b2ff06279f242eb38e4e7a01ca85d5a8444cb9a43f16ca666d992b41eb"
    },
    {
      "name": "/tmp/pants",
      "status": "missing",
      "size": null,
      "uid": null,
      "gid": null,
      "mode": null,
      "atime": null,
      "mtime": null,
      "checksum": null
    }
  ],
  "timestamp": {
    "utc_epoch": 1583358463
  }
}
1 Like

Example code can now be accessed here:

It is very rough around the edges with a lot to be fixed.