I’m working to transition our old RobotMK v1(.5) checks to the new Synthetic monitoring.
I held out to do this until CheckMK 2.4 to avoid rebuilding my robot army from RH9 to Windows and back again.
I’ve managed to get 1 rcc environment running on a RHEL9 server after some selinux exploration and learning. Now I’m ready to add other test suites but want to understand what that might look like.
My original robots were in one single folder with the conda.yaml requirements and subfolders by product(s) and the .robot file(s). In version 1.5, these checks were defined under 1 rule and ran one after the other occasionally running into timing issue if there were site/login delays. Expected and notated in that version.
With the new built-in robotmk (Linux) agent rule:
It identifies ‘first matching parameter’ so everything still needs to be built under a single rule if running on the same host.
I created the required robot.yaml which points to conda.yaml
I can ‘Add a New Sequence’ and define each check using the same parent folder and point to the same robot.yaml but define new variables, similar to the old way.
Since I am running headless, I have seen a couple of posts where that keeps issues of ‘mouse control’ mentioned in the documentation from being a problem.
When I try this, the 2nd check service plan reports “No data available because none of the attempts produced any outpu”. Manually running the check works from rcc task shell after an rfbrowser init chromium-headless-shell.
So my guess is each one might need it’s own 'tracking:identity’ and I should generate a robot.yaml and conda.yaml file in each subfolder and reference each fro the same base folder? Or is there something obvious that I’m overlooking?
Calling it a day on my end of the world but thought I’d toss this out for review and see if anyone has any thoughts or guidance.
The rpaframework package is a huge meta-package, originally developed by Robocorp—the company was bought out (they now do “something with AI”…) and all the automation libraries for Robot Framework are only semi-well maintained. With rpaframework, you’re bringing in far too many libraries at once, which unnecessarily prolongs the construction of the environments.
I see that you’re running web tests – here’s an example of a functioning conda.yaml without all the overhead:
You can also omit the “tracking” key, which configures the telemetry of RCC, which Robotmk always disables by default. The identity has no relevance for Robotmk.
It is generally not a problem to have all test files in one directory and have them all use the same conda.yaml. (However, when it comes to testing different applications that may also be worked on by different people, I strongly recommend separate directories so that they can be maintained separately as repositories.
Now to the actual error “No data available”: Could it be that the test is taking too long to run and the Robotmk timeout is kicking in? In this case, Robot Framework is aborted abruptly.
Can you see in Robotmk’s working_dir that an execution log (with timestamp) has been created for this plan? It also contains log files for stderr and stdout. There you can see how far the test execution gets until it gets aborted due to a timeout.
Thanks for the tip. My original conda.yaml was from one of the original templates so thank you for that info. I’m updating it and the robot to switch from Library RPA.Browser.Playwright to Library Browser as I write this.
I didn’t actually include the tracking key or subvalues, I thought rcc was appending that as part of the build process. It could some of my manual testing pointing to the config file did that instead.
For today’s testing, I removed the tracking key altogether and it hasn’t show back up so I bet some of my manual runs did that.
I have all of my old checks running using the same conda and robot.yaml now except 2, which are reporting issues with some keyword deprecation/changes (e.g. RETURN instead of [RETURN]). I’ll post a follow up once I get everything worked out.
Out of curiosity, for my workflow, is there still a way to use a DISABLE file to skip a single check? I have a couple of sites that I still have to manually stop, update forced expiry tokens on and then update the agent rule (password as a variable).
To your last question: the DISABLE file is only supported in the MKP version of Robotmk.
(What “tokens” do you have to update, can you give more details about that?)
Indeed, it was easy enough for that one and just reading some updated documentation for the others. If I hadn’t made a couple of case-sensitive mistakes, it might have gone quicker. As of this time I have all my existing checks running on a test server and will be migrating the production checks over shortly.
Tokens was a poor choice of words. Was working on something else while writing a response. I should have said updating variables. I am passing USERNAME and PASSWORD variables in the robotmk scheduler rule to allow centralized management of this information. Primarily because I have multiple hosts in various cloud regions to simulate customer login from the ‘world’.
My robots are fairly basic login and check for some piece of information. Most of the sites have longer password expiration but one has a short 45 day period and locks permanently if I forget to update it. My old procedure was using our orchestration toolset salt to avoid the robot locking out the site while I was mid-update:
Place the robotmk and specific robot service in downtime
touch a DISABLED file in the robot folder to bypass the check.
Manually update the password in the portal
Manually run a local robot --outputdir /tmp –variable USERNAME:foo –variable PASSWORD:bar /path/to/site.robot and confirm all passed
This works, but I guess you are aware of the fact that the sensitive data are stored in plain text on the Robotmk host, right?
Did you try the robotframework-cryptolibrary?
It allows you to encrypt all sensitive data using an asymetric key pair. The password for the private key is required to decrypt the sensitive data (they all begin with `crypt:` then); I usually store this as an environment variable especially for the Robotmk scheduler’s systemd service.
By doing that, you can have the robot code in git without having sensitive data in it.
I think this could be a nice topic for a blog post.
Yes and it was something I considered for a while before implementing in earlier robotmk. For those legacy checks, I was using ‘Fill Secret’ with the $PASSWORD variable to prevent logging/output visibility.
I saw the environment variables option and in the documentation plus the inline help. I put it on my near future to do list. A blog post might be very handy.
As a follow up, based on your earlier recommendation, I’ve created individual conda.yaml and robot.yaml files in each of my ‘APP’ subfolders.
This makes it easier it easier to watch individual ./APP/output/ files for troubleshooting and had the added benefit of allowing me to add a dependency that I must have manually applied on the older robot server (beautifulsoup4, requests).
As of this time I have 7 individual robots running from 4 different hosts on ‘cloud’ VMs.
Actually, yes, I could use your help with a couple of questions:
If I make a change to a conda.yaml file, it’s my understanding that the hash changes and the rcc environment should rebuild automatically on the next plan evaluation/run. There are a couple of mentions of this on the robotmk doc page.
This doesn’t seem to reliably work. Errors in the logs indicate it is ‘blocked by settings’ but no details.
I have found that a forced reinstall of the agent via cmk-update-agent -f works more consistently if a bit heavy handed.
Is this expected or something that might need deeper investigation?
I’m seeing sporadic errors for individual Plans: Suite setup failed: Could not connect to the playwright process at 127.0.0.1:34213.
It happens across various plans individually, not all at once.
When it occurs, the service for both plan and check go critical
Is this just a resources issue? Are there rough sizing estimates ‘per robot’?
startup: RCC settings etc, including building the environments.
scheduling: executing the plans
That means: to rebuild the environments, the checkmk agent must be restartet. This also invokes a scheduler restart, which then (and only then) builds new environments.
We decided against environment creation while other tests are running in parallel because this is a very resource intensive process (it uses all cores -1, a lot of network bandwidth, CPU and disk IO). The risk to influence the runtime of the other running tests would be too high.
To your second question: how many tests do you have, and how are the machine resources?
Makes sense. I got a different impression from the mentions of the conda.yaml file change(s).
Each VM is 2 vcpus, 4 GiB memory. A very small footprint. There are 7 robots executing very basic headless Load Site > Login > Validate information > Logout steps at this time.
thanks, I see. The length of the test does not really have an impact here.
I usually start with VMs with 4 cpus and 8 GB memory, never had outages.
Try to double the resource on one of your machines and see how it works. (I am sure, monitoring won’t tell you these short resource consumption spikes, unfortunately)
Appreciate the help and feedback. Bumping up the resources definitely helped with Plan build times and those earlier socket errors.
The changes to more recent versions of packages definitely caused some heartache but also gave me an opportunity to update my robots to handle some little latency/timeout issues we would occasionally see.
I’ve finalized the updates to my various robots and migrated a few resource settings to a common file with some options selected to avoid some selinux pitfalls with headless chromium in RH9.
Posting it here for reference if anyone else tries their hand at using a RedHat flavor of Linux and requires SELinux to remain enabled. I didn’t want to create a local user to have to manage/rememeber.
Built an SELinux .te policy using a clean /var/log/audit/audit.log file and letting the robotmk run through a few iterations of a plan with setenforce 0. Compiled and imported (plenty of web resources for this step).
Used some chromium arguments to avoid when cmk-agent running as root and using various directories/resources in /dev/udma or /root/ folders triggering more selinux and headlesss chromium behavior.
Migrated to a common resource file for browser settings (after my 30th edit or so)
HOME folder in the /opt/robotk/ folder so earlier selinux policy labels applied
arguments --disable-gpu and --disable-dev-shm-usage to stop /dev/ search and failure on chromium start instead of adding more selinux policy rules in dev/root folders.
And then in each APPFOLDER/filename.robot, calling this as a resource:
Library Browser
Resource ../common.resource
Metadata Version 202603051743
Suite Setup Standard Browser Setup
Feedback or ‘better ways’ advice welcome but I’m currently happy with the results and looking forward to showcasing more robotframework testing possibilities to coworkers.
Sincerely,
Scotsie
PS: Officially removed robotmk v1.5 rules and extension today!
I did not have to install any additional packages but this was the same host I had running the legacy plugins so it had all the packages and dependencies from the earlier build.
My salt state file I used to prep for the robotmk 1.4/5 agent is below. I suspect the packages and rfbrowser init brought in the items needed.
If I build or redeploy I will try to take note and send you a message.