Error could not create fsnotify watcher: too many open files Kubernetes

Hi Everyone !

I’ve have a problem when I deploy crowdsec agent on Kubernetes with Helm.
failed to configure datasource file: could not create fsnotify watcher: too many open files"

I’ve problem systcl watcher user, Ok, so I’ve find this Problem config for acquisition

I’ve modified my systcl conf with this :

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-setup
  namespace: kube-system
  labels:
    k8s-app: node-setup
spec:
  selector:
    matchLabels:
      name: node-setup
  template:
    metadata:
      labels:
        name: node-setup
    spec:
      containers:
      - name: node-setup
        image: ubuntu
        command: ["/bin/sh","-c"]
        args: ["/script/node-setup.sh; while true; do echo Sleeping && sleep 3600; done"]
        volumeMounts:
          - name: node-setup-script
            mountPath: /script
          - name: dev
            mountPath: /dev
          - name: etc-lvm
            mountPath: /etc/lvm
        securityContext:
          allowPrivilegeEscalation: true
          privileged: true
      volumes:
        - name: node-setup-script
          configMap:
            name: node-setup-script
            defaultMode: 0755
        - name: dev
          hostPath:
            path: /dev
        - name: etc-lvm
          hostPath:
            path: /etc/lvm
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: node-setup-script
  namespace: kube-system
data:
  node-setup.sh: |
    #!/bin/bash
    set -e

    # change the file-watcher max-count on each node to 524288

    # insert the new value into the system config
    sysctl -w fs.inotify.max_user_watches=8192

    # check that the new value was applied
    cat /proc/sys/fs/inotify/max_user_watches

In agent pods or lapi if I make a cat of max user watches the values that modified. (old value: 128)

I’ve try destroy namespace, upgrade, I’ve all time same message

time="03-01-2024 21:35:04" level=info msg="loading acquisition file : /etc/crowdsec/acquis.yaml"
time="03-01-2024 21:35:04" level=fatal msg="crowdsec init: while loading acquisition config: while configuring datasource of type file from /etc/crowdsec/acquis.yaml (position: 0): failed to configure datasource file: could not create fsnotify watcher: too many open files"
time="03-01-2024 21:35:04" level=fatal msg="crowdsec init: while loading acquisition config: while configuring datasource of type file from /etc/crowdsec/acquis.yaml (position: 0): failed to configure datasource file: could not create fsnotify watcher: too many open files"

Here my values yaml :
(I’ve tried set poll_without_inotify to true, but agent try it anyway with fsnotify)

container_runtime: containerd

agent:
  # To specify each pod you want to process it logs (pods present in the node)
  acquisition:
    # The namespace where the pod is located
    - namespace: ingress-nginx
      # The pod name
      podName: ingress-nginx-controller-*
      # as in crowdsec configuration, we need to specify the program name so the parser will match and parse logs
      program: nginx
      poll_without_inotify: false
  # resources:
  #   limits:
  #     memory: 100Mi
  #   requests:
  #     cpu: 150m
  #     memory: 100Mi
  env:
  - name: COLLECTIONS
    value: "crowdsecurity/nginx"
lapi:
  dashboard:
    # -- Enable Metabase Dashboard (by default disabled)
    enabled: true
    image:
      # -- docker image repository name
      repository: metabase/metabase
      # -- pullPolicy
      pullPolicy: IfNotPresent
      # -- docker image tag
      tag: "v0.48.1"
    # -- Metabase SQLite static DB containing Dashboards
    assetURL: https://crowdsec-statics-assets.s3-eu-west-1.amazonaws.com/metabase_sqlite.zip
    # -- Enable ingress object
    ingress:
      enabled: true
      annotations:
        # metabase only supports http so we need this annotation
        nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
        cert-manager.io/cluster-issuer: "letsencrypt-prod"
        nginx.ingress.kubernetes.io/auth-secret: ingress-metabase-cs
        nginx.ingress.kubernetes.io/auth-realm: "Authentication Required"
      # labels: {}
      ingressClassName: "nginx" # nginx
      host: "metabase.crowdsec.domain.fr"
      tls:
      - hosts:
        - metabase.crowdsec.domain.fr
        secretName: metabase-crowdsec-cert
      # tls: {}
  resources:
    limits:
      memory: 100Mi
    requests:
      cpu: 150m
      memory: 100Mi
  persistentVolume:
    data:
      enabled: true
      accessModes:
        - ReadWriteOnce
      storageClassName: ""
      existingClaim: ""
      size: 10Gi
    config:
      enabled: true
      accessModes:
        - ReadWriteOnce
      storageClassName: "scw-bssd"
      existingClaim: "lapi-crowdsec-pvc"
      size: 10Gi

Thank you !

I’ve all destroy and option poll_without_inotify to true work !
I’ll get back to you tomorrow to confirm

If I upgrade helm, one agent I’ve error
Delete NS and install helm is ok, what is a problem ?

Config Map into kubernetes after deploy :
force_inotify: true
poll_without_inotify: true

Why force inotify is true if I set without inotify

So even though these options sound the same they both have different functions

force_inotify: true

Will setup an inoitfy on the folder that is specified within the acquisition in your example /var/log/containers/. I believe this is needed within k8s as when the log file changes due to naming it was never being picked up again.

poll_without_inotify: true

This setting informs crowdsec to NOT setup an inotify on the file itself, because inotify can be used to inform once a file changes if you change this option to true it will start manually stating the file every 1 second and this can lead to high cpu usage compared to using inotify.

Okey thanks I understand !

But why if I upgrade my helm, kill agent pods and daemonset create agent on each node, but only one agent work.

The second have this message :

time="04-01-2024 11:19:19" level=info msg="loading acquisition file : /etc/crowdsec/acquis.yaml"
time="04-01-2024 11:19:19" level=fatal msg="crowdsec init: while loading acquisition config: while configuring datasource of type file from /etc/crowdsec/acquis.yaml (position: 0): failed to configure datasource file: could not create fsnotify watcher: too many open files"
time="04-01-2024 11:19:19" level=fatal msg="crowdsec init: while loading acquisition config: while configuring datasource of type file from /etc/crowdsec/acquis.yaml (position: 0): failed to configure datasource file: could not create fsnotify watcher: too many open files"

Other agent work :

time="04-01-2024 11:17:37" level=info msg="loading acquisition file : /etc/crowdsec/acquis.yaml"
time="04-01-2024 11:17:37" level=info msg="Force add watch on /var/log/containers" type=file
time="04-01-2024 11:17:37" level=info msg="Adding file /var/log/containers/ingress-nginx-controller-jcr8t_ingress-nginx_controller-4b79c669e188e8b44b4f21c36368e608afa81d237f57c434a4e167f31453c65b.log to datasources" type=file
time="04-01-2024 11:17:37" level=info msg="Adding file /var/log/containers/ingress-nginx-controller-jcr8t_ingress-nginx_init-clone-crowdsec-bouncer-9819ce7e8ca58b7e62b668c5b2675595a23fccf35a8e1770d8c3679dd9cd8739.log to datasources" type=file
time="04-01-2024 11:17:37" level=info msg="Starting processing data"

I see juste one log file for the daemonSet Nginx,

I think the problem is here

With agent pod running I make ls /var/log/containers

ingress-nginx-controller-jcr8t_ingress-nginx_controller-4b79c669e188e8b44b4f21c36368e608afa81d237f57c434a4e167f31453c65b.log
ingress-nginx-controller-jcr8t_ingress-nginx_init-clone-crowdsec-bouncer-9819ce7e8ca58b7e62b668c5b2675595a23fccf35a8e1770d8c3679dd9cd8739.log

I’ve tried to nikto scan whit one pod agent crashloopback and one agent running, and I’ve 404 forbiden after ban.

I inspecting my nginx this afternoon

I’ve created Ubuntu pods and I mounted host path /var/log and I see logs files for each nodes

ingress-nginx-controller-j9jrj_ingress-nginx_controller-d56226af6ab710a55a4262a1c5c93cb8dd1e9715c203d9ae60d72c15b3adae63.log
ingress-nginx-controller-j9jrj_ingress-nginx_init-clone-crowdsec-bouncer-70ebf1e8af4b074763bc7e5449b7caf9fe7bda08d76c99546d46015836febf4a.log


ingress-nginx-controller-tlqmj_ingress-nginx_controller-52def449e06da4f93b3cba6bbebc3832d1c28a22f2acab0094fd6722d9c7def7.log
ingress-nginx-controller-tlqmj_ingress-nginx_init-clone-crowdsec-bouncer-5f0258262f53fe3149aebdc32e7cb25757d7a328e448a44ac55ead1dbb025ce0.log

I’ve increase max_user_instances, and second agent is up

New node setup :

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-setup
  namespace: kube-system
  labels:
    k8s-app: node-setup
spec:
  selector:
    matchLabels:
      name: node-setup
  template:
    metadata:
      labels:
        name: node-setup
    spec:
      containers:
      - name: node-setup
        image: ubuntu
        command: ["/bin/sh","-c"]
        args: ["/script/node-setup.sh; while true; do echo Sleeping && sleep 3600; done"]
        volumeMounts:
          - name: node-setup-script
            mountPath: /script
          - name: dev
            mountPath: /dev
          - name: etc-lvm
            mountPath: /etc/lvm
        securityContext:
          allowPrivilegeEscalation: true
          privileged: true
      volumes:
        - name: node-setup-script
          configMap:
            name: node-setup-script
            defaultMode: 0755
        - name: dev
          hostPath:
            path: /dev
        - name: etc-lvm
          hostPath:
            path: /etc/lvm
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: node-setup-script
  namespace: kube-system
data:
  node-setup.sh: |
    #!/bin/bash
    set -e

    # change the file-watcher max-count on each node to 524288

    # insert the new value into the system config
    sysctl -w fs.inotify.max_user_watches=524288
    sysctl -w fs.inotify.max_user_instances=8192

    # check that the new value was applied
    cat /proc/sys/fs/inotify/max_user_watches
    cat /proc/sys/fs/inotify/max_user_instances