False Positive IP blocking

I see a repeated pattern of false positives and I cannot understand.
I put the debug setting to true, but it didn’t seem to change the logging info.

I have a http-crawl-non_statics level of 400 capacity and a 1s leak
I think this means that when the 401st request comes in it will spill and block the IP.

Here is what I see in the logs:
(Note I deleted the ban after the first trigger and you can see the retrigger in the logs)

time="29-09-2021 11:03:15" level=info msg="Ip 69.136.133.107 performed 'crowdsecurity/http-crawl-non_statics' (988 events over 14m15.619972416s) at 2021-09-29 11:03:15.222310069 -0500 CDT m=+7467.250285646"
time="29-09-2021 11:03:15" level=info msg="(e0329a75983f4477ab7afb7f6f09a1094IrAyC6dbDbTOMdU/crowdsec) crowdsecurity/http-crawl-non_statics by ip 69.136.133.107 (US) : 1h ban on Ip 69.136.133.107"
time="29-09-2021 11:06:48" level=info msg="Ip 69.136.133.107 performed 'crowdsecurity/http-crawl-non_statics' (603 events over 3m22.031372879s) at 2021-09-29 11:06:48.755867063 -0500 CDT m=+7680.783842640"
time="29-09-2021 11:06:49" level=info msg="(e0329a75983f4477ab7afb7f6f09a1094IrAyC6dbDbTOMdU/crowdsec) crowdsecurity/http-crawl-non_statics by ip 69.136.133.107 (US) : 1h ban on Ip 69.136.133.107"

Based on the log data, it says there were 603 events over 3 min… I don’t understand why it is logged this way, but in any case, it tripped the IP block… that’s well under 400/sec

Often when I check my haproxy logs, there aren’t the same number of ‘events’ (e.g. log lines) with the suspect IP recorded in my haproxy logs.

What am I missing? We keep blocking people we don’t want to block and it seems our thresholds should not trigger this.

Here is my http-crawl-non_statics.yaml file:

type: leaky
name: crowdsecurity/http-crawl-non_statics
description: "Detect aggressive crawl from single ip"
filter: "evt.Meta.log_type in ['http_access-log', 'http_error-log'] && evt.Parsed.static_ressource == 'false'"
distinct: "evt.Parsed.file_name"
leakspeed: 1s
capacity: 420
debug: true
#this limits the memory cache (and event_sequences in output) to five events
cache_size: 5
groupby: "evt.Meta.source_ip + '/' + evt.Parsed.target_fqdn"
blackhole: 1m
labels:
 service: http
 type: crawl
 remediation: true

Thanks.

Hello @haldrich !

The leakspeed represents the frequence at which an event will “leak” out of the bucket, while the capacity is really how much items the bucket can hold.

In your case, a leakspeed of 1s and a capacity of 400 means that : The users is expected to perform up to one request per second (if he’s doing one request every second, the bucket “capacity” will never go above 1 as it will leak within one second of its arrival), and it has a capacity of 400 to permit a “burst” : You could do 400 requests at once (bucket will be full but not overflow), and after 200 second the capacity will be 200 and 200 seconds later the capacity will be 0 again.

Please let me know if it’s more clear to you now !

Don’t hesitate to reach over gitter if you want to chat about it (or have a look at your logs to help)