Synthetic Monitoring in 2.4 and multiple RCC Checks Question

Hi all,

CheckMK 2.4.0p21 CME
OS - RHEL 9.7 (plow)

I’m working to transition our old RobotMK v1(.5) checks to the new Synthetic monitoring.
I held out to do this until CheckMK 2.4 to avoid rebuilding my robot army from RH9 to Windows and back again.

I’ve managed to get 1 rcc environment running on a RHEL9 server after some selinux exploration and learning. Now I’m ready to add other test suites but want to understand what that might look like.

My original robots were in one single folder with the conda.yaml requirements and subfolders by product(s) and the .robot file(s). In version 1.5, these checks were defined under 1 rule and ran one after the other occasionally running into timing issue if there were site/login delays. Expected and notated in that version.

With the new built-in robotmk (Linux) agent rule:

  • It identifies ‘first matching parameter’ so everything still needs to be built under a single rule if running on the same host.

  • I created the required robot.yaml which points to conda.yaml

    conda.yaml
    channels:
        - conda-forge
    dependencies:
        - python=3.11
        - pip
        - robocorp-truststore
        - nodejs
        - pip:
            - robotframework-browser
            - rpaframework
    rccPostInstall:
        - rfbrowser init chromium-headless-shell
    tracking:
        consent: true
        identity: <UUID>
    
  • I can ‘Add a New Sequence’ and define each check using the same parent folder and point to the same robot.yaml but define new variables, similar to the old way.

  • Since I am running headless, I have seen a couple of posts where that keeps issues of ‘mouse control’ mentioned in the documentation from being a problem.

When I try this, the 2nd check service plan reports “No data available because none of the attempts produced any outpu”. Manually running the check works from rcc task shell after an rfbrowser init chromium-headless-shell.

So my guess is each one might need it’s own 'tracking:identity’ and I should generate a robot.yaml and conda.yaml file in each subfolder and reference each fro the same base folder? Or is there something obvious that I’m overlooking?

Calling it a day on my end of the world but thought I’d toss this out for review and see if anyone has any thoughts or guidance.

Sincerely,

Scotsie

Hi Scott,

Thanks for the detailed description.

First of all, regarding conda.yaml.

The rpaframework package is a huge meta-package, originally developed by Robocorp—the company was bought out (they now do “something with AI”…) and all the automation libraries for Robot Framework are only semi-well maintained. With rpaframework, you’re bringing in far too many libraries at once, which unnecessarily prolongs the construction of the environments.

I see that you’re running web tests – here’s an example of a functioning conda.yaml without all the overhead:

You can also omit the “tracking” key, which configures the telemetry of RCC, which Robotmk always disables by default. The identity has no relevance for Robotmk.

It is generally not a problem to have all test files in one directory and have them all use the same conda.yaml. (However, when it comes to testing different applications that may also be worked on by different people, I strongly recommend separate directories so that they can be maintained separately as repositories.

Now to the actual error “No data available”: Could it be that the test is taking too long to run and the Robotmk timeout is kicking in? In this case, Robot Framework is aborted abruptly.

Can you see in Robotmk’s working_dir that an execution log (with timestamp) has been created for this plan? It also contains log files for stderr and stdout. There you can see how far the test execution gets until it gets aborted due to a timeout.

Best regards,

Simon

Hi @simonm ,

Thanks for the response.

Thanks for the tip. My original conda.yaml was from one of the original templates so thank you for that info. I’m updating it and the robot to switch from Library RPA.Browser.Playwright to Library Browser as I write this.

I didn’t actually include the tracking key or subvalues, I thought rcc was appending that as part of the build process. It could some of my manual testing pointing to the config file did that instead.

For today’s testing, I removed the tracking key altogether and it hasn’t show back up so I bet some of my manual runs did that.

I have all of my old checks running using the same conda and robot.yaml now except 2, which are reporting issues with some keyword deprecation/changes (e.g. RETURN instead of [RETURN]). I’ll post a follow up once I get everything worked out.

Out of curiosity, for my workflow, is there still a way to use a DISABLE file to skip a single check? I have a couple of sites that I still have to manually stop, update forced expiry tokens on and then update the agent rule (password as a variable).

Sincerely,

Scotsie

Hi Scott,

the deprecation is easy to fix, see New `RETURN` statement for returning from user keywords · Issue #4078 · robotframework/robotframework · GitHub

To your last question: the DISABLE file is only supported in the MKP version of Robotmk.
(What “tokens” do you have to update, can you give more details about that?)

Regards, Simon

Indeed, it was easy enough for that one and just reading some updated documentation for the others. If I hadn’t made a couple of case-sensitive mistakes, it might have gone quicker. As of this time I have all my existing checks running on a test server and will be migrating the production checks over shortly.

Tokens was a poor choice of words. Was working on something else while writing a response. I should have said updating variables. I am passing USERNAME and PASSWORD variables in the robotmk scheduler rule to allow centralized management of this information. Primarily because I have multiple hosts in various cloud regions to simulate customer login from the ‘world’.

My robots are fairly basic login and check for some piece of information. Most of the sites have longer password expiration but one has a short 45 day period and locks permanently if I forget to update it. My old procedure was using our orchestration toolset salt to avoid the robot locking out the site while I was mid-update:

  • Place the robotmk and specific robot service in downtime
  • touch a DISABLED file in the robot folder to bypass the check.
  • Manually update the password in the portal
  • Manually run a local robot --outputdir /tmp –variable USERNAME:foo –variable PASSWORD:bar /path/to/site.robot and confirm all passed
  • Update the Agent rule with new password and bake
  • cmk-update-agent -vto pull newer config
  • Remove the DISABLED file
  • Remove downtime once service(s) catch up

Hope this helps clarify the workflow I mentioned.

Sincerely,

Scotsie

Ah, I see.

This works, but I guess you are aware of the fact that the sensitive data are stored in plain text on the Robotmk host, right?

Did you try the robotframework-cryptolibrary?
It allows you to encrypt all sensitive data using an asymetric key pair. The password for the private key is required to decrypt the sensitive data (they all begin with `crypt:` then); I usually store this as an environment variable especially for the Robotmk scheduler’s systemd service.
By doing that, you can have the robot code in git without having sensitive data in it.

I think this could be a nice topic for a blog post. :thinking:

Yes and it was something I considered for a while before implementing in earlier robotmk. For those legacy checks, I was using ‘Fill Secret’ with the $PASSWORD variable to prevent logging/output visibility.

I saw the environment variables option and in the documentation plus the inline help. I put it on my near future to do list. A blog post might be very handy.

Thanks for your help and feedback.

Scotsie

1 Like

Is there anything else I can help with?

As a follow up, based on your earlier recommendation, I’ve created individual conda.yaml and robot.yaml files in each of my ‘APP’ subfolders.

This makes it easier it easier to watch individual ./APP/output/ files for troubleshooting and had the added benefit of allowing me to add a dependency that I must have manually applied on the older robot server (beautifulsoup4, requests).

As of this time I have 7 individual robots running from 4 different hosts on ‘cloud’ VMs.

Actually, yes, I could use your help with a couple of questions:

  • If I make a change to a conda.yaml file, it’s my understanding that the hash changes and the rcc environment should rebuild automatically on the next plan evaluation/run. There are a couple of mentions of this on the robotmk doc page.
    • This doesn’t seem to reliably work. Errors in the logs indicate it is ‘blocked by settings’ but no details.
    • I have found that a forced reinstall of the agent via cmk-update-agent -f works more consistently if a bit heavy handed.
    • Is this expected or something that might need deeper investigation?
  • I’m seeing sporadic errors for individual Plans:
    Suite setup failed: Could not connect to the playwright process at 127.0.0.1:34213.
    • It happens across various plans individually, not all at once.
    • When it occurs, the service for both plan and check go critical
    • Is this just a resources issue? Are there rough sizing estimates ‘per robot’?

Sincerely,

Scotsie

Hi Scott,

the Scheduler works in 2 phases:

  • startup: RCC settings etc, including building the environments.
  • scheduling: executing the plans

That means: to rebuild the environments, the checkmk agent must be restartet. This also invokes a scheduler restart, which then (and only then) builds new environments.

We decided against environment creation while other tests are running in parallel because this is a very resource intensive process (it uses all cores -1, a lot of network bandwidth, CPU and disk IO). The risk to influence the runtime of the other running tests would be too high.

To your second question: how many tests do you have, and how are the machine resources?

Simon

Makes sense. I got a different impression from the mentions of the conda.yaml file change(s).

Each VM is 2 vcpus, 4 GiB memory. A very small footprint. There are 7 robots executing very basic headless Load Site > Login > Validate information > Logout steps at this time.

Scotsie,

Hi Scott,

thanks, I see. The length of the test does not really have an impact here.

I usually start with VMs with 4 cpus and 8 GB memory, never had outages.

Try to double the resource on one of your machines and see how it works. (I am sure, monitoring won’t tell you these short resource consumption spikes, unfortunately)

/Simon

@simonm ,

Appreciate the help and feedback. Bumping up the resources definitely helped with Plan build times and those earlier socket errors.

The changes to more recent versions of packages definitely caused some heartache but also gave me an opportunity to update my robots to handle some little latency/timeout issues we would occasionally see.

I’ve finalized the updates to my various robots and migrated a few resource settings to a common file with some options selected to avoid some selinux pitfalls with headless chromium in RH9.

Posting it here for reference if anyone else tries their hand at using a RedHat flavor of Linux and requires SELinux to remain enabled. I didn’t want to create a local user to have to manage/rememeber.

  • Built an SELinux .te policy using a clean /var/log/audit/audit.log file and letting the robotmk run through a few iterations of a plan with setenforce 0. Compiled and imported (plenty of web resources for this step).
  • Used some chromium arguments to avoid when cmk-agent running as root and using various directories/resources in /dev/udma or /root/ folders triggering more selinux and headlesss chromium behavior.
  • Migrated to a common resource file for browser settings (after my 30th edit or so)
    • HOME folder in the /opt/robotk/ folder so earlier selinux policy labels applied
    • arguments --disable-gpu and --disable-dev-shm-usage to stop /dev/ search and failure on chromium start instead of adding more selinux policy rules in dev/root folders.
# common.resource
*** Settings ***
Library    Browser
*** Keywords ***
Standard Browser Setup
    ${browser_env}=    Create Dictionary
    ...    HOME=/opt/robotmk/chrome-home
    ...    PATH=%{PATH}
    ...    DISPLAY=%{DISPLAY:=}
    New Browser
    ...    browser=chromium
    ...    headless=True
    ...    args=["--disable-gpu", "--disable-dev-shm-usage", "--no-sandbox"]
    ...    env=${browser_env}

And then in each APPFOLDER/filename.robot, calling this as a resource:

Library         Browser
Resource        ../common.resource
Metadata        Version    202603051743

Suite Setup     Standard Browser Setup

Feedback or ‘better ways’ advice welcome but I’m currently happy with the results and looking forward to showcasing more robotframework testing possibilities to coworkers.

Sincerely,

Scotsie

PS: Officially removed robotmk v1.5 rules and extension today!

Hey Scott,

BrowserLibrary/Playwright in RHEL? This is gold, thanks so much for your documentation.

Did you have to install any other packages on the host to make this possible?

My state of knowledge was always that it only runs on Debian/Ubuntu.

Regards, Simon

Hi @simonm ,

I did not have to install any additional packages but this was the same host I had running the legacy plugins so it had all the packages and dependencies from the earlier build.

My salt state file I used to prep for the robotmk 1.4/5 agent is below. I suspect the packages and rfbrowser init brought in the items needed.

If I build or redeploy I will try to take note and send you a message.

---
robotframework-packages-requirements:
  pkg.installed:
    - pkgs:
      - python3-pip
      - nodejs
      - npm
      - git

  npm.installed:
    - name: playwright@latest
    - require:
      - pkg: robotframework-packages-requirements

robotframework-python-setup:
  pip.installed:
    - pkgs:
      - mergedeep
      - robotframework
      - rpaframework
      - robotframework-browser
      - robotframework-datadriver
{# Adding bin_env to potentially handle centos7 dual python2/3 #}
    - bin_env: /usr/bin/python3
    - require:
      - pkg: robotframework-packages-requirements

robotframework-test-cases:
  git.latest:
    - name: https://<gitlab.fqdn>/path/to/robotframework-robots.git
    - target: /usr/lib/check_mk_agent/robotframework-robots
    - rev: main
    - https_user: saltmaster
    - https_pass: {{ api_token }}

  file.recurse:
    - name: /usr/lib/check_mk_agent/robot
    - source: salt://{{ slspath }}/files/robot
    - clean: True

robotframework-browser-init:
  cmd.run:
    - name: rfbrowser init
...

Interesting. Thanks a lot!

/Simon