Facebookexternalhit/1.1 hits aggressively

Hi guys

Facebook scraper is hardly hitting us, sometimes more than 160 req/s and our infra.

example :

69.171.231.10 - - [19/Aug/2024:00:00:19 +0000] "GET / HTTP/1.1" 200 121871 "ref=-" "ua=facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "rLoc=-" "reqt=0.396" "respt=0.024 : 0.376" "host=www.*************.fr" "cache=MISS" "upstream=10.216.x.x:9010 : 10.216.x.x:9001" "uheadt=0.024 : 0.224"

2 questions :

  • is there a way to blacklist a user agent ? "“ua=facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)”
  • Except with “cscli decisions add” command, no way to add this on crowdsec configuration ?

Thanks :slight_smile:

You can create a scenario to block these after the first request, but imo I would create rate limit rules firstly then you can then use the inbuilt rate limit abuse detection

map $http_user_agent $limit_bot {
  default "";
  ~*(facebookexternalhit) $http_user_agent;
}

limit_req_zone $limit_bot zone=bots:10m rate=30r/m;

Then you can use it in a specific url or just a http zone

location / {
  limit_req zone=bots burst=4;
  limit_req_status 429;
  ...
}