CMK version:
2.1.0.p9 (CMK Raw) OS version:
Ubuntu Server 20.04.4 LTS (Built myself from ISO)
Ubuntu Server 18.04.6 LTS (Virtual Appliance from OVA) Error message:
TLS Agent connection will not work with any of my ubuntu hosts when I upgraded their agents from 2.0.something to 2.1.0.p9
RHEL and Alma hosts are having no issues.
I’m having to Remove the TLS registration on the server for my Ubuntu hosts in order to restore monitoring, otherwise only the vCenter host level checks work… All agent based monitoring dies.
Several errors can be observed
On an agent’s status page.
Service: Check_MK
[agent] Host is registered for TLS but not using itCRIT, [piggyback] Successfully processed from source 'vCenter.mydomain.local', Missing monitoring data for plugins: checkmk_agent, df, kernel_util, lnx_if, mem_linux, systemd_units_services_summary, tcp_conn_stats, uptimeWARN, execution time 0.1 sec
Some hosts (but not all will display)
Service: Systemd Service Summary
service failed (cmk-agent-ctl-daemon)CRIT
If I go to connection tests for that Host
Agent Test:
Host is registered for TLS but not using it<<<esx_vsphere_vm:cached(1660458089,90)>>>
config.datastoreUrl name NFS_Diskstation02-01|accessible true|capacity 17251337441280|freeSpace 7659080781824|maintenanceMode normal|type NFS41|uncommitted 4115428454400|url ds:///vmfs/volumes/b5b7b7b5-219c4d2f-0000-000000000000/
config.guestFullName Ubuntu Linux (64-bit)
config.hardware.device virtualDeviceType VirtualCdrom|label CD/DVD drive 1|summary Remote ATAPI|startConnected false|allowGuestControl true|connected false|status ok@@virtualDeviceType VirtualVmxnet3|label Network adapter 1|summary DVSwitch: 50 15 fe ea 4d 16 69 13-e8 23 38 09 b0 14 05 cd|startConnected true|allowGuestControl true|connected true|status ok
config.hardware.memoryMB 2048
config.hardware.numCPU 2
config.hardware.numCoresPerSocket 1
config.template false
config.uuid 42150d85-f1ae-71ef-4b6c-43f4baca4ffc
config.version vmx-19
guest.toolsVersion 11360
guest.toolsVersionStatus guestToolsUnmanaged
guestHeartbeatStatus green
name vmansible01
runtime.host arthur.MyDomain.local
runtime.powerState poweredOn
summary.guest.hostName vmansible01
summary.quickStats.balloonedMemory 0
summary.quickStats.compressedMemory 0
summary.quickStats.consumedOverheadMemory 40
summary.quickStats.distributedCpuEntitlement 179
summary.quickStats.distributedMemoryEntitlement 931
summary.quickStats.guestMemoryUsage 163
summary.quickStats.hostMemoryUsage 2084
summary.quickStats.overallCpuDemand 179
summary.quickStats.overallCpuUsage 161
summary.quickStats.privateMemory 2044
summary.quickStats.sharedMemory 0
summary.quickStats.staticCpuEntitlement 1086
summary.quickStats.staticMemoryEntitlement 2318
summary.quickStats.swappedMemory 0
summary.quickStats.uptimeSeconds 112368
<<<labels:sep(0)>>>
{"cmk/piggyback_source_vCenter.MyDomain.local": "yes"}
Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)
First: Revert all the changes you describe here.
Then take a look it /etc/xinet.d/ there might be a residual configuration file called checkmk or similar.
You can either delete that file or remove xinetd altogether.
I had purged that file during some of my tests but the issue persisted.
I had registered TLS per the doc you referenced, which usually worked for a few mins until it suddenly breaks. After that I have to disable the check_mk.socket for it to work right.
No issues on AlmaLinux though, just my ubuntu hosts .
the socket appears to be somehow re-enabling itself on atleast one host =/
So, the bandaid stopped working on one host,
How can I go about purging all trace of any binaries, services and configuration files that may be leftover from 2.0.x
I’m thinking a complete purge and then install of 2.1 may be the way to go
In order to get the system linked again I had to do this.
These three unit files should not be there.
After removing the old agent you can remove these. I would stop the socket before removing the unit.
If you then install the 2.1 agent you should only have unit files inside “/lib/systemd/system/”.
There should be now three
check-mk-agent-async.service
check-mk-agent@.service
check-mk-agent.socket
The port 6556 should be used by the agent controller (cmk-agent-ctl).
Could this possibly be added to the Linux agent troubleshooting page since It doesn’t appear to be one-off issue? Or maybe even have a check for those old check_mk service files added to future 2.1.x installers?
Today i had a small discussion with Tribe staff and submitted the findings for Ubuntu/Debian.
I think that something must happen as the 2.1 agent is not possible to deploy in large environments at the moment. Every deployment needs manual or scripted steps. Agent bakery update is not working this way at the moment.
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.