Whitelist user agent and regex expression

Hi all !
I need to whitelist a specific user agent.

Example of the user agent :

Toto-Scraper/1.2.5 Acme (https://www.acme.com/en/digital-website)

I tried to look at the expr documentation, and I did something like

   - evt.Parsed.http_user_agent matches "Toto-Scraper\/*.*.* Acme \(https:\/\/www.acme.com\/en\/digital-website\)"

but with this Crowdsec fails to start with an error about escape character.

For now I put

   - evt.Parsed.http_user_agent matches "Toto-Scraper/*.*.* Acme (https://www.acme.com/en/digital-website)"

but I am not sure if it works.

Will

   - evt.Parsed.http_user_agent matches  "Toto-Scraper/*.*.*"

work ?
If I put

   - evt.Parsed.http_user_agent matches  "Toto-Scraper.*"

will it work too ?

So it seems those “expr” are not like real regexp, right ?

Thanks :slight_smile:

PS : Acme and Toto are here to anonymize :wink:

Hello !

If you have a “specific” user-agent, you can even use simple string match (==).
And yes, expr are not PCRE regexp, but golang’s RE2

As for escaping, as it’s in a string you need to escape twice, so :

- evt.Parsed.http_user_agent matches "Toto-Scraper/.* Acme \\(https://www.acme.com/en/digital-website\\)"

would do the trick :wink:

Seems it did not work, my scraper got banned…

I put

- evt.Parsed.http_user_agent matches "Toto-Scraper.*"

Let’s see…

Hello,

I did some tests locally and it worked for me. Can you share some logs that triggered a ban for example + your whitelist ?

Thanks,

Hi Thibault

time="04-11-2021 11:55:47" level=info msg="Ip 34.245.204.174 performed 'crowdsecurity/http-probing' (13 events over 1m33.240032522s) at 2021-11-04 11:55:47.849649143 +0000 UTC m=+73604.010430156"
time="04-11-2021 11:55:48" level=info msg="(3c1f443a494fb47370e37e34680787eerc0sj7zOceruPRSl/crowdsec) crowdsecurity/http-probing by ip 34.245.204.174 (IE) : 4h ban on Ip 34.245.204.174"

There should be more logs ?

The yaml file is located in /etc/crowdsec/parsers/s02-enrich/

name: crowdsecurity/totoscraper
description: "Whitelist events from toto scraper"
whitelist:
  reason: "toto scraper"
  expression:
   - evt.Parsed.http_user_agent matches "Toto-Scraper/.* Acme \\(https://www.acme.com/en/digital-website\\)"

ua exemple :

Toto-Scraper/1.2.5 Acme (https://www.acme.com/en/digital-website)

Thanks

Hello,

Sorry, I meant the apache/nginx logs that triggers the error :smiley:

to be precise, there is no error :wink: (I think you meant “ban”)

34.245.204.174 - - [04/Nov/2021:10:31:43 +0000]   "GET /lycee/contact@toto.fr HTTP/1.1" 404 27909  "ref=https://www.toto.fr/lycee/" "ua=Toto-Scraper/1.2.5 Sototo (https://www.acme.com/en/digital-website)"  
"rLoc=-"  "reqt=3.919" "respt=0.004 : 2.020" "host=www.toto.fr" "cache=MISS" "upstream=10.216.124.44:9010 : 10.216.124.47:9001"  "uheadt=0.004 : 2.012"

(as always, toto and acme for anonymization)

Hi !
This can be closed, as the issue was with the initial grok pattern, I guess user agent was not well caught.
Now I see some whitelist events :

time="18-11-2021 16:59:19" level=info msg="Event is whitelisted by Expr !" id=misty-dew name=crowdsecurity/anonymousscraper stage=s02-enrich

Thanks !