Checkmk_kube_agent -> Cannot read smaps files for any PID from CONTAINER

CMK version:

  • 2.1 stable
  • checkmk/cadvisor-patched:main_2022.03.02 und checkmk/cadvisor-patched:main_2022.05.26

der erste cadvisor-patched ist aus dem offiziellen Github Helm, main_2022.05.26 habe ich probiert umd zu schauen, ob der Bug ggf. bereits gefixed wurde

OS version:
Debian Bullseye

Error message:

Logs checkmk-node-collector-container-metrics-XXXXX:

cadvisor W0527 07:45:14.605797       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER                                                                                                                         
cadvisor W0527 07:45:16.577284       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER                                                                                                                         
cadvisor W0527 07:45:17.412378       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER                                                                                                                         
cadvisor W0527 07:45:17.535796       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER                                                                                                                         
cadvisor W0527 07:45:18.152667       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER                                                                                                                         
cadvisor W0527 07:45:18.233585       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER                                                                                                                         
cadvisor W0527 07:45:18.240729       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER                                                                                                                         
cadvisor W0527 07:45:18.478526       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER                                                                                                                         
cadvisor W0527 07:45:20.215751       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER                                                                                                                         
cadvisor W0527 07:45:20.254583       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER                                                                                                                         
cadvisor W0527 07:45:20.349195       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER                                                                                                                         
cadvisor W0527 07:45:21.366708       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER                                                                                                                         
container-metrics-collector INFO:     2022-05-27 07:45:21,821 - Shut down gracefully                                                                                                                                              
cadvisor I0527 07:45:21.822206       1 manager.go:1193] Exiting thread watching subcontainers                                                                                                                                     
cadvisor I0527 07:45:21.822243       1 manager.go:407] Exiting global housekeeping thread                                                                                                                                         
cadvisor I0527 07:45:21.823160       1 cadvisor.go:210] Exiting given signal: terminated     

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)

Was ist die version deines Kubernetes?
Ist es on-premise oder managed?
Welche Container-Runtime verwendest du?

Hast du die k8-monitoring wie hier beschrieben eingerichtet? Kubernetes überwachen

  • v1.22.7
  • on premise
  • containerd
  • ja, monitoring funktioniert auch soweit bis auf die metrics

cadvisor W0527 07:45:20.349195 1 handler.go:426] Cannot read smaps files for any PID from CONTAINER

Kannst du dich in den cadvisor Container einloggen und “cat /proc//smaps” ausführen?
Ich bin nicht in der Lage, das Problem zu reproduzieren. Verwendest du PSP in deinem Setup? Im cadvisor-Container fügen wir die capability CAP_SYS_TRACE hinzu und lassen alle fallen. Mit dieser capability sollte der Zugriff auf /proc/PID/smaps möglich sein.

PSP ist an, habe die normale da den standard aus der values.yaml genommen:

/ # cat /proc/smaps
cat: can't open '/proc/smaps': No such file or directory

Ohne geht es aber :wink: Vielen Dank

Es sollte /proc/PID/smaps sein.
PID = Dies ist ein Prozess-ID, die im Grunde eine Zahl wie 1,30 usw.

/ # ls /proc/PID
ls: /proc/PID: No such file or directory

Oh, funktioniert doch noch nicht, die UI vom Dashboard hatte sich etwas geändert, deswegen dachte ich, dass das die Lösung wäre.

Meine values.yaml:

image:
  # Overrides the image tag whose default is the chart appVersion.
  # ref: https://hub.docker.com/r/checkmk/kubernetes-collector/tags
  tag: "main_2022.03.02" # main_<YYYY.MM.DD>

rbac:
  pspEnabled: false

networkPolicy:
  enabled: false
  allowIngressFromCIDRs: []
  egressKubeApiserver:
    enableCidrLookup: true

## Configuration for cluster-collector
clusterCollector:
  image:
    repository: myregistry.tld/checkmk/kubernetes-collector
    pullPolicy: IfNotPresent

  # can be: "debug", "info", "warning" (default), "critical"
  logLevel: warning

  podAnnotations:
    seccomp.security.alpha.kubernetes.io/pod: runtime/default

  podSecurityContext: {}
    # fsGroup: 2000

  securityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
        - ALL
    privileged: false
    readOnlyRootFilesystem: true
    runAsGroup: 10001
    runAsNonRoot: true
    runAsUser: 10001

  service:
    # if required specify "NodePort" here to expose the cluster-collector via the "nodePort" specified below
    type: ClusterIP
    port: 8080
    # nodePort: 30035

  ingress:
    enabled: true
    className: "nginx"
    annotations:
      nginx.ingress.kubernetes.io/rewrite-target: /$2
    hosts:
      - host: services.internal
        paths:
          - path: /checkmk(/|$)(.*)
            pathType: Prefix
    tls:
      - secretName: services-secret
        hosts:
          - services.internal

  livenessProbe:
    initialDelaySeconds: 3
    periodSeconds: 10
    timeoutSeconds: 2
    failureThreshold: 3

  resources:
    limits:
      cpu: 300m
      memory: 200Mi
    requests:
      cpu: 150m
      memory: 200Mi

  nodeSelector: {}

  tolerations: []

  affinity: {}


## Configuration for node-collector components (cadvisor, container-metrics, machine-sections)
nodeCollector:
  # logLevel for container-metrics and machine-sections; can be: "debug", "info", "warning" (default), "critical"
  logLevel: debug

  # Pods of nodeCollectors will typically be ready for a short amount of time before detecting
  # problems. The value below ensures, that they don't become available as well.
  minReadySeconds: 15

  # Annotations to be added to node-collector pods
  podAnnotations:
    seccomp.security.alpha.kubernetes.io/pod: runtime/default

  podSecurityContext: {}
    # fsGroup: 2000

  ## Assign a nodeSelector if operating a hybrid cluster
  ##
  nodeSelector: {}
  #   beta.kubernetes.io/arch: amd64
  #   beta.kubernetes.io/os: linux

  tolerations: []
  # - effect: NoSchedule
  #   operator: Exists

  ## Assign a PriorityClassName to pods if set
  # priorityClassName: ""

  cadvisor:
    image:
      repository: myregistry.tld/checkmk/cadvisor-patched
      pullPolicy: IfNotPresent

    additionalArgs:
      - "--housekeeping_interval=30s"
      - "--max_housekeeping_interval=35s"
      - "--event_storage_event_limit=default=0"
      - "--event_storage_age_limit=default=0"
      - "--store_container_labels=false"
      - "--whitelisted_container_labels=io.kubernetes.container.name,io.kubernetes.pod.name,io.kubernetes.pod.namespace,io.kubernetes.pod.uid"
      - "--global_housekeeping_interval=30s"
      - "--event_storage_event_limit=default=0"
      - "--event_storage_age_limit=default=0"
      - "--disable_metrics=percpu,process,sched,tcp,udp,diskIO,disk,network"
      - "--allow_dynamic_housekeeping=true"
      - "--storage_duration=1m0s"

    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
        add: ["CAP_SYS_PTRACE"]
      privileged: false
      readOnlyRootFilesystem: true

    resources:
      limits:
        cpu: 300m
        memory: 200Mi
      requests:
        cpu: 150m
        memory: 200Mi

  containerMetricsCollector:
    image:
      repository: myregistry.tld/checkmk/kubernetes-collector
      pullPolicy: IfNotPresent

    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
      privileged: false
      readOnlyRootFilesystem: true
      runAsGroup: 10001
      runAsNonRoot: true
      runAsUser: 10001

    resources:
      limits:
        cpu: 300m
        memory: 200Mi
      requests:
        cpu: 150m
        memory: 200Mi

  machineSectionsCollector:
    image:
      repository: myregistry.tld/checkmk/kubernetes-collector
      pullPolicy: IfNotPresent

    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
      privileged: false
      readOnlyRootFilesystem: true
      runAsGroup: 10001
      runAsNonRoot: true
      runAsUser: 10001

    resources:
      limits:
        cpu: 300m
        memory: 200Mi
      requests:
        cpu: 150m
        memory: 200Mi

Selbst wenn ich im DaemonSet im Container cAdvisor den securityContext: entferne, erhalte ich in der shell keinen Zugriff auf /proc/PID/

/ # ls /proc/
1              buddyinfo      cpuinfo        driver         fs             kallsyms       kpagecgroup    meminfo        net            schedstat      swaps          timer_list     vmstat
22             bus            crypto         dynamic_debug  interrupts     kcore          kpagecount     misc           pagetypeinfo   self           sys            tty            zoneinfo
34             cgroups        devices        execdomains    iomem          key-users      kpageflags     modules        partitions     slabinfo       sysrq-trigger  uptime
acpi           cmdline        diskstats      fb             ioports        keys           loadavg        mounts         pressure       softirqs       sysvipc        version
asound         consoles       dma            filesystems    irq            kmsg           locks          mtrr           sched_debug    stat           thread-self    vmallocinfo
/ # find /proc -name "PID*"
/ # 

Hier wird die Prozess-ID aufgelistet. Siehst due sich die Nummern 1,22 und 34 an. Jetzt solltest du in der Lage sein, cat /proc/22/smaps auszuführen?

Der Container wirft immernoch:

cadvisor W0528 08:52:06.005327       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER                                                                                                                         

Allerdings funktioniert ein cat /proc/30/smaps/:

/ # cat /proc/30/smaps
556b0b437000-556b0b443000 r--p 00000000 08:01 2891404                    /bin/busybox
Size:                 48 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  48 kB
Pss:                  24 kB
Shared_Clean:         48 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:           48 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: rd mr mw me dw sd 
...

Auch ein Wechsel zu cadvisor-patched:101 (Docker Hub) half nicht.

CheckMK version: 2.1
OS: Debian 11
Kubernetes/K3s version: v1.23.8+k3s2
CheckMK_kube_agent: v1.1.0

Im able to read the smaps file when i enter the pod/container, but receive this error in the logs

W0929 11:54:49.980403       1 manager.go:159] Cannot detect current cgroup on cgroup v2
W0929 11:54:55.056123       1 machine_libipmctl.go:64] There are no NVM devices!
W0929 11:54:55.059217       1 info.go:53] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
W0929 11:54:55.061511       1 manager.go:291] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
W0929 11:54:55.064681       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
W0929 11:54:55.066200       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
W0929 11:54:55.066738       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
W0929 11:54:55.067114       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
W0929 11:54:55.068076       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER
W0929 11:54:55.068325       1 handler.go:426] Cannot read smaps files for any PID from CONTAINER

Seems to be an upstream bug Log flooded by "Cannot read smaps files for any PID from CONTAINER" · Issue #3139 · google/cadvisor · GitHub

Wir haben das gleiche Problem mit rke2. Ist geplant, die neue Version im Dockerfile zu installieren?

1 Like

Right now, fixed temporarily by adding --disable_metrics=referenced_memory in nodeCollector.cadvisor.additionalArgs, as suggested in the cadvisor issue.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.