Troubleshoot high RAM consumption

How can I troubleshoot my crowdsec daemon due to high RAM usage?

For RAM, currently it is 12G in total and is growing.

I have used pprof and were shocked - 8GB is consumed by github.com/crowdsecurity/crowdsec/pkg/leakybucket.NewQueue

Is it possible to lower its’ memory consumption? cache_size: 1 from scenario configuration might be useful for me, but unfortunately it hurts notification feature, thus it is not suitable for me:

Cache size will affect the number of events you receive within an alert.

Could you share your scenarios so we know what you are trying to achieve?

/etc/crowdsec/scenarios/endlessh-distinctconn.yaml
type: leaky
leakspeed: "24h"
capacity: 3
name: skhron/endlessh-distinctconn
description: "Reacts on 4 distinct dstip:dstport pairs per srcip, default behaviour"
filter: "evt.Meta.log_type == 'endlessh_accept'"
groupby: evt.Meta.source_ip
distinct: evt.Meta.destination_ip + ":" + evt.Meta.destination_port
blackhole: 12h
reprocess: true
labels:
  service: endlessh
  type: scan
  remediation: true
/etc/crowdsec/scenarios/endlessh-highconn.yaml
type: leaky
leakspeed: "6h"
capacity: 127
name: skhron/endlessh-highconn
description: "Reacts on 128 connections per SrcIP, fallback behaviour used when attacker targets few DstIP:DstPort"
filter: "evt.Meta.log_type == 'endlessh_accept'"
groupby: evt.Meta.source_ip
distinct: evt.Meta.source_port
blackhole: 12h
reprocess: true
labels:
  service: endlessh
  type: scan
  remediation: true

Well it would be beneficial to use cache_size as that it is the intended use case.

Why you do you need all 128 events in a single notification?

but the problem is the high capacity of the bucket if they can be divide and same as leakspeed the ram usage will come down.

It would not do the job for me because I won’t receive all 4 or 128 events in a notification webhook.

I need all 4 or 128 events because depending on attacker, they might generate 4 connection to distinct DstIP:DstPort pairs within long time frame or faster but to the single destination. To prevent false positives, I wait for 128 connections if distinctconn scenario haven’t fired.

Then there nothing to troubleshoot, you either need to narrow down the capacity to smaller amount / leakspeed else it will hold all 128 events in memory leading to high consumption.

Are there any plans to change cache_size behaviour to lessen the load on RAM but still provide all events to the notification hooks?

I’m not following the question, what you are asking is impossible. How would we achieve to send all events without holding the events?

I mean holding events in database, it should be much more RAM friendly. Is it possible to achieve?

It would mean redesigning the whole system, so we would not do this.

However, the team are interested in getting a pprof of the high RAM usage to see if there is an issue

Could you send a snapshot to laurence@crowdsec.net

Is it from http://localhost:8081/ui/download ?

You can redirect the output to a file

curl http://localhost:6060/debug/pprof/heap > /tmp/heap.pprof

Then you can just send me the heap.pprof file

Whatever your prometheus port is on is the same port you should use. By default it is 6060