Hi all !
I need to whitelist a specific user agent.
Example of the user agent :
Toto-Scraper/1.2.5 Acme (https://www.acme.com/en/digital-website)
I tried to look at the expr documentation, and I did something like
- evt.Parsed.http_user_agent matches "Toto-Scraper\/*.*.* Acme \(https:\/\/www.acme.com\/en\/digital-website\)"
but with this Crowdsec fails to start with an error about escape character.
For now I put
- evt.Parsed.http_user_agent matches "Toto-Scraper/*.*.* Acme (https://www.acme.com/en/digital-website)"
but I am not sure if it works.
Will
- evt.Parsed.http_user_agent matches "Toto-Scraper/*.*.*"
work ?
If I put
- evt.Parsed.http_user_agent matches "Toto-Scraper.*"
will it work too ?
So it seems those “expr” are not like real regexp, right ?
Thanks
PS : Acme and Toto are here to anonymize
Hello !
If you have a “specific” user-agent, you can even use simple string match (==
).
And yes, expr
are not PCRE regexp, but golang’s RE2
As for escaping, as it’s in a string you need to escape twice, so :
- evt.Parsed.http_user_agent matches "Toto-Scraper/.* Acme \\(https://www.acme.com/en/digital-website\\)"
would do the trick
Seems it did not work, my scraper got banned…
I put
- evt.Parsed.http_user_agent matches "Toto-Scraper.*"
Let’s see…
Hello,
I did some tests locally and it worked for me. Can you share some logs that triggered a ban for example + your whitelist ?
Thanks,
Hi Thibault
time="04-11-2021 11:55:47" level=info msg="Ip 34.245.204.174 performed 'crowdsecurity/http-probing' (13 events over 1m33.240032522s) at 2021-11-04 11:55:47.849649143 +0000 UTC m=+73604.010430156"
time="04-11-2021 11:55:48" level=info msg="(3c1f443a494fb47370e37e34680787eerc0sj7zOceruPRSl/crowdsec) crowdsecurity/http-probing by ip 34.245.204.174 (IE) : 4h ban on Ip 34.245.204.174"
There should be more logs ?
The yaml file is located in /etc/crowdsec/parsers/s02-enrich/
name: crowdsecurity/totoscraper
description: "Whitelist events from toto scraper"
whitelist:
reason: "toto scraper"
expression:
- evt.Parsed.http_user_agent matches "Toto-Scraper/.* Acme \\(https://www.acme.com/en/digital-website\\)"
ua exemple :
Toto-Scraper/1.2.5 Acme (https://www.acme.com/en/digital-website)
Thanks
Hello,
Sorry, I meant the apache/nginx logs that triggers the error
to be precise, there is no error (I think you meant “ban”)
34.245.204.174 - - [04/Nov/2021:10:31:43 +0000] "GET /lycee/contact@toto.fr HTTP/1.1" 404 27909 "ref=https://www.toto.fr/lycee/" "ua=Toto-Scraper/1.2.5 Sototo (https://www.acme.com/en/digital-website)"
"rLoc=-" "reqt=3.919" "respt=0.004 : 2.020" "host=www.toto.fr" "cache=MISS" "upstream=10.216.124.44:9010 : 10.216.124.47:9001" "uheadt=0.004 : 2.012"
(as always, toto and acme for anonymization)
Hi !
This can be closed, as the issue was with the initial grok pattern, I guess user agent was not well caught.
Now I see some whitelist events :
time="18-11-2021 16:59:19" level=info msg="Event is whitelisted by Expr !" id=misty-dew name=crowdsecurity/anonymousscraper stage=s02-enrich
Thanks !