Brice187
(Lars Kerick)
May 27, 2022, 7:51am
1
CMK version:
2.1 stable
checkmk/cadvisor-patched:main_2022.03.02 und checkmk/cadvisor-patched:main_2022.05.26
der erste cadvisor-patched ist aus dem offiziellen Github Helm, main_2022.05.26
habe ich probiert umd zu schauen, ob der Bug ggf. bereits gefixed wurde
OS version:
Debian Bullseye
Error message:
Logs checkmk-node-collector-container-metrics-XXXXX:
cadvisor W0527 07:45:14.605797 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
cadvisor W0527 07:45:16.577284 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
cadvisor W0527 07:45:17.412378 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
cadvisor W0527 07:45:17.535796 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
cadvisor W0527 07:45:18.152667 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
cadvisor W0527 07:45:18.233585 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
cadvisor W0527 07:45:18.240729 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
cadvisor W0527 07:45:18.478526 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
cadvisor W0527 07:45:20.215751 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
cadvisor W0527 07:45:20.254583 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
cadvisor W0527 07:45:20.349195 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
cadvisor W0527 07:45:21.366708 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
container-metrics-collector INFO: 2022-05-27 07:45:21,821 - Shut down gracefully
cadvisor I0527 07:45:21.822206 1 manager.go:1193] Exiting thread watching subcontainers
cadvisor I0527 07:45:21.822243 1 manager.go:407] Exiting global housekeeping thread
cadvisor I0527 07:45:21.823160 1 cadvisor.go:210] Exiting given signal: terminated
Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)
Was ist die version deines Kubernetes?
Ist es on-premise oder managed?
Welche Container-Runtime verwendest du?
Hast du die k8-monitoring wie hier beschrieben eingerichtet? Kubernetes überwachen
cadvisor W0527 07:45:20.349195 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
Kannst du dich in den cadvisor Container einloggen und “cat /proc//smaps” ausführen?
Ich bin nicht in der Lage, das Problem zu reproduzieren. Verwendest du PSP in deinem Setup? Im cadvisor-Container fügen wir die capability CAP_SYS_TRACE hinzu und lassen alle fallen. Mit dieser capability sollte der Zugriff auf /proc/PID/smaps möglich sein.
Brice187
(Lars Kerick)
May 27, 2022, 3:02pm
5
PSP ist an, habe die normale da den standard aus der values.yaml
genommen:
/ # cat /proc/smaps
cat: can't open '/proc/smaps': No such file or directory
Ohne geht es aber Vielen Dank
Es sollte /proc/PID/smaps sein.
PID = Dies ist ein Prozess-ID, die im Grunde eine Zahl wie 1,30 usw.
Brice187
(Lars Kerick)
May 27, 2022, 5:57pm
7
/ # ls /proc/PID
ls: /proc/PID: No such file or directory
Oh, funktioniert doch noch nicht, die UI vom Dashboard hatte sich etwas geändert, deswegen dachte ich, dass das die Lösung wäre.
Meine values.yaml:
image:
# Overrides the image tag whose default is the chart appVersion.
# ref: https://hub.docker.com/r/checkmk/kubernetes-collector/tags
tag: "main_2022.03.02" # main_<YYYY.MM.DD>
rbac:
pspEnabled: false
networkPolicy:
enabled: false
allowIngressFromCIDRs: []
egressKubeApiserver:
enableCidrLookup: true
## Configuration for cluster-collector
clusterCollector:
image:
repository: myregistry.tld/checkmk/kubernetes-collector
pullPolicy: IfNotPresent
# can be: "debug", "info", "warning" (default), "critical"
logLevel: warning
podAnnotations:
seccomp.security.alpha.kubernetes.io/pod: runtime/default
podSecurityContext: {}
# fsGroup: 2000
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001
service:
# if required specify "NodePort" here to expose the cluster-collector via the "nodePort" specified below
type: ClusterIP
port: 8080
# nodePort: 30035
ingress:
enabled: true
className: "nginx"
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /$2
hosts:
- host: services.internal
paths:
- path: /checkmk(/|$)(.*)
pathType: Prefix
tls:
- secretName: services-secret
hosts:
- services.internal
livenessProbe:
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 2
failureThreshold: 3
resources:
limits:
cpu: 300m
memory: 200Mi
requests:
cpu: 150m
memory: 200Mi
nodeSelector: {}
tolerations: []
affinity: {}
## Configuration for node-collector components (cadvisor, container-metrics, machine-sections)
nodeCollector:
# logLevel for container-metrics and machine-sections; can be: "debug", "info", "warning" (default), "critical"
logLevel: debug
# Pods of nodeCollectors will typically be ready for a short amount of time before detecting
# problems. The value below ensures, that they don't become available as well.
minReadySeconds: 15
# Annotations to be added to node-collector pods
podAnnotations:
seccomp.security.alpha.kubernetes.io/pod: runtime/default
podSecurityContext: {}
# fsGroup: 2000
## Assign a nodeSelector if operating a hybrid cluster
##
nodeSelector: {}
# beta.kubernetes.io/arch: amd64
# beta.kubernetes.io/os: linux
tolerations: []
# - effect: NoSchedule
# operator: Exists
## Assign a PriorityClassName to pods if set
# priorityClassName: ""
cadvisor:
image:
repository: myregistry.tld/checkmk/cadvisor-patched
pullPolicy: IfNotPresent
additionalArgs:
- "--housekeeping_interval=30s"
- "--max_housekeeping_interval=35s"
- "--event_storage_event_limit=default=0"
- "--event_storage_age_limit=default=0"
- "--store_container_labels=false"
- "--whitelisted_container_labels=io.kubernetes.container.name,io.kubernetes.pod.name,io.kubernetes.pod.namespace,io.kubernetes.pod.uid"
- "--global_housekeeping_interval=30s"
- "--event_storage_event_limit=default=0"
- "--event_storage_age_limit=default=0"
- "--disable_metrics=percpu,process,sched,tcp,udp,diskIO,disk,network"
- "--allow_dynamic_housekeeping=true"
- "--storage_duration=1m0s"
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
add: ["CAP_SYS_PTRACE"]
privileged: false
readOnlyRootFilesystem: true
resources:
limits:
cpu: 300m
memory: 200Mi
requests:
cpu: 150m
memory: 200Mi
containerMetricsCollector:
image:
repository: myregistry.tld/checkmk/kubernetes-collector
pullPolicy: IfNotPresent
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001
resources:
limits:
cpu: 300m
memory: 200Mi
requests:
cpu: 150m
memory: 200Mi
machineSectionsCollector:
image:
repository: myregistry.tld/checkmk/kubernetes-collector
pullPolicy: IfNotPresent
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001
resources:
limits:
cpu: 300m
memory: 200Mi
requests:
cpu: 150m
memory: 200Mi
Brice187
(Lars Kerick)
May 27, 2022, 6:39pm
8
Selbst wenn ich im DaemonSet im Container cAdvisor
den securityContext:
entferne, erhalte ich in der shell keinen Zugriff auf /proc/PID/
/ # ls /proc/
1 buddyinfo cpuinfo driver fs kallsyms kpagecgroup meminfo net schedstat swaps timer_list vmstat
22 bus crypto dynamic_debug interrupts kcore kpagecount misc pagetypeinfo self sys tty zoneinfo
34 cgroups devices execdomains iomem key-users kpageflags modules partitions slabinfo sysrq-trigger uptime
acpi cmdline diskstats fb ioports keys loadavg mounts pressure softirqs sysvipc version
asound consoles dma filesystems irq kmsg locks mtrr sched_debug stat thread-self vmallocinfo
/ # find /proc -name "PID*"
/ #
Hier wird die Prozess-ID aufgelistet. Siehst due sich die Nummern 1,22 und 34 an. Jetzt solltest du in der Lage sein, cat /proc/22/smaps auszuführen?
Brice187
(Lars Kerick)
May 28, 2022, 8:54am
10
Der Container wirft immernoch:
cadvisor W0528 08:52:06.005327 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
Allerdings funktioniert ein cat /proc/30/smaps/
:
/ # cat /proc/30/smaps
556b0b437000-556b0b443000 r--p 00000000 08:01 2891404 /bin/busybox
Size: 48 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 48 kB
Pss: 24 kB
Shared_Clean: 48 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 48 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 0
VmFlags: rd mr mw me dw sd
...
Auch ein Wechsel zu cadvisor-patched:101 (Docker Hub ) half nicht.
CheckMK version : 2.1
OS : Debian 11
Kubernetes/K3s version : v1.23.8+k3s2
CheckMK_kube_agent : v1.1.0
Im able to read the smaps file when i enter the pod/container, but receive this error in the logs
W0929 11:54:49.980403 1 manager.go:159] Cannot detect current cgroup on cgroup v2
W0929 11:54:55.056123 1 machine_libipmctl.go:64] There are no NVM devices!
W0929 11:54:55.059217 1 info.go:53] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
W0929 11:54:55.061511 1 manager.go:291] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
W0929 11:54:55.064681 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
W0929 11:54:55.066200 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
W0929 11:54:55.066738 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
W0929 11:54:55.067114 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
W0929 11:54:55.068076 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
W0929 11:54:55.068325 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
Brice187
(Lars Kerick)
January 3, 2023, 9:33am
12
Wir haben das gleiche Problem mit rke2. Ist geplant, die neue Version im Dockerfile zu installieren?
bmalynovytch
(Benjamin MALYNOVYTCH)
December 5, 2023, 4:00pm
14
Right now, fixed temporarily by adding --disable_metrics=referenced_memory
in nodeCollector.cadvisor.additionalArgs
, as suggested in the cadvisor issue.