Too many open files while talking to LAPI

Hi !

Yesterday, we got an outage on production. I am trying to figure if those errors are the cause or consequence of our outage

2021/08/24 18:55:58 [error] 7744#7744: *1378580 [lua] CrowdSec.lua:103: allowIp(): [Crowdsec] Http error Too many open files while talking to LAPI (http://127.0.0.1:8080/v1/decisions?ip=87.87.87.87), client: 87.87.87.87, server: www.*, request: "GET / HTTP/1.1", host: "www.toto.fr", referrer: "https://www.google.com/"

root@my_wonderful_server:/var/log/nginx# ulimit -n
1024
root@my_wonderful_server:/var/log/nginx# ps -eaf|grep nginx
root        3472       1  0 Aug23 ?        00:00:00 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
www-data    7743    3472  0 Aug23 ?        00:23:19 nginx: worker process
www-data    7744    3472  0 Aug23 ?        00:27:19 nginx: worker process
www-data    7745    3472  0 Aug23 ?        00:00:00 nginx: cache manager process
root       20617   18729  0 13:59 pts/0    00:00:00 grep --color=auto nginx
root@my_wonderful_server:/var/log/nginx# cat /proc/3472/limits |grep open
Max open files            1024                 524288               files     
root@my_wonderful_server:/var/log/nginx# cat /proc/7743/limits |grep open
Max open files            5000                 5000                 files     
root@my_wonderful_server:/var/log/nginx#

I checked the prometheus graph, and it seems ok (but this is the open files for Crowdsec, ,not for nginx right ?)

What do you think ?

Hello,

Can you paste the crowdsec log when this error occured please?
Also, what is your crowdsec version?

About the outage, the nginx bouncer should not block any traffic if an error occur.

sure, but no interesting logs on crowdsec.log (issue occured starting 18:55)
crowdsec 1.1.1

time="24-08-2021 18:47:27" level=info msg="Signal push: 1 signals to push"
time="24-08-2021 18:48:41" level=warning msg="390 existing LeakyRoutine"
time="24-08-2021 18:51:43" level=warning msg="393 existing LeakyRoutine"
time="24-08-2021 18:55:00" level=warning msg="330 existing LeakyRoutine"
time="24-08-2021 19:00:05" level=warning msg="348 existing LeakyRoutine"
time="24-08-2021 19:04:36" level=warning msg="395 existing LeakyRoutine"
time="24-08-2021 19:09:18" level=warning msg="364 existing LeakyRoutine"

crowdsec_api.log

time="24-08-2021 18:55:57" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:57 UTC] \"GET /v1/decisions?ip=217.67.150.56 HTTP/1.1 200 1.401921ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:57" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:57 UTC] \"GET /v1/decisions?ip=91.188.215.198 HTTP/1.1 200 1.387012ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:57" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:57 UTC] \"GET /v1/decisions?ip=176.103.88.57 HTTP/1.1 200 1.3355ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:57" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:57 UTC] \"GET /v1/decisions?ip=54.36.148.167 HTTP/1.1 200 1.26327ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:57" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:57 UTC] \"GET /v1/decisions?ip=88.99.136.47 HTTP/1.1 200 1.229187ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:57" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:57 UTC] \"GET /v1/decisions?ip=193.251.162.188 HTTP/1.1 200 1.349626ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:58" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:58 UTC] \"GET /v1/decisions?ip=176.103.85.167 HTTP/1.1 200 1.266484ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:58" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:58 UTC] \"GET /v1/decisions?ip=176.168.2.84 HTTP/1.1 200 1.268158ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:58" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:58 UTC] \"GET /v1/decisions?ip=47.97.80.51 HTTP/1.1 200 1.380163ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:58" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:58 UTC] \"GET /v1/decisions?ip=176.222.57.46 HTTP/1.1 200 783.348µs \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:58" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:58 UTC] \"GET /v1/decisions?ip=217.96.158.67 HTTP/1.1 200 1.249517ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:58" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:58 UTC] \"GET /v1/decisions?ip=64.147.91.228 HTTP/1.1 200 1.237303ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:58" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:58 UTC] \"GET /v1/decisions?ip=104.168.248.159 HTTP/1.1 200 1.480127ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:58" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:58 UTC] \"GET /v1/decisions?ip=54.216.50.216 HTTP/1.1 200 1.362095ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:59" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:59 UTC] \"GET /v1/decisions?ip=173.254.202.176 HTTP/1.1 200 1.262958ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:59" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:59 UTC] \"GET /v1/decisions?ip=91.188.215.198 HTTP/1.1 200 1.434186ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:59" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:59 UTC] \"GET /v1/decisions?ip=54.216.50.216 HTTP/1.1 200 1.256018ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:59" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:59 UTC] \"GET /v1/decisions?ip=176.103.88.57 HTTP/1.1 200 1.248048ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:59" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:59 UTC] \"GET /v1/decisions?ip=31.37.217.175 HTTP/1.1 200 1.174457ms \"crowdsec-nginx-bouncer/v0.0.6\" \""
time="24-08-2021 18:55:59" level=info msg="127.0.0.1 - [Tue, 24 Aug 2021 18:55:59 UTC] \"GET /v1/decisions?ip=176.222.57.46 HTTP/1.1 200 1.396987ms \"crowdsec-nginx-bouncer/v0.0.6\" \""

nginx log

my_log_file.log.2.gz:2021/08/24 18:55:58 [error] 7744#7744: *1378580 [lua] CrowdSec.lua:103: allowIp(): [Crowdsec] Http error Too many open files while talking to LAPI (http://127.0.0.1:8080/v1/decisions?ip=87.88.186.124), client: 87.88.186.124, server: www.*, request: "GET / HTTP/1.1", host: "www.toto.fr", referrer: "https://www.google.com/"
my_log_file.log.2.gz:2021/08/24 18:55:58 [error] 7744#7744: *1378580 [lua] CrowdSec.lua:103: allowIp(): [Crowdsec] Http error Too many open files while talking to LAPI (http://127.0.0.1:8080/v1/decisions?ip=87.88.186.124), client: 87.88.186.124, server: www.*, request: "GET / HTTP/1.1", host: "www.ttoto.fr", referrer: "https://www.google.com/"
my_log_file.log.2.gz:2021/08/24 18:55:58 [error] 7744#7744: *1378624 [lua] CrowdSec.lua:103: allowIp(): [Crowdsec] Http error Too many open files while talking to LAPI (http://127.0.0.1:8080/v1/decisions?ip=91.149.244.152), client: 91.149.244.152, server: www.*, request: "POST /xmlrpc.php HTTP/1.1", host: "www.toto.com"
my_log_file.log.2.gz:2021/08/24 18:55:58 [error] 7744#7744: *1378624 [lua] CrowdSec.lua:103: allowIp(): [Crowdsec] Http error Too many open files while talking to LAPI (http://127.0.0.1:8080/v1/decisions?ip=91.149.244.152), client: 91.149.244.152, server: www.*, request: "POST /xmlrpc.php HTTP/1.1", host: "www.toto.com"
my_log_file.log.2.gz:2021/08/24 18:55:58 [error] 7744#7744: *1378625 [lua] CrowdSec.lua:103: allowIp(): [Crowdsec] Http error Too many open files while talking to LAPI (http://127.0.0.1:8080/v1/decisions?ip=201.193.165.243), client: 201.193.165.243, server: www.*, request: "GET /wp-content/uploads/sites/7405/2020/04/volets-roulants-2.jpg HTTP/1.1", host: "www.toto.fr", referrer: "https://www.google.com/"
my_log_file.log.2.gz:2021/08/24 18:55:58 [error] 7744#7744: *1378625 [lua] CrowdSec.lua:103: allowIp(): [Crowdsec] Http error Too many open files while talking to LAPI (http://127.0.0.1:8080/v1/decisions?ip=201.193.165.243), client: 201.193.165.243, server: www.*, request: "GET /wp-content/uploads/sites/7405/2020/04/volets-roulants-2.jpg HTTP/1.1", host: "www.toto.fr", referrer: "https://www.google.com/"
my_log_file.log.2.gz:2021/08/24 18:55:58 [error] 7744#7744: *1378633 [lua] CrowdSec.lua:103: allowIp(): [Crowdsec] Http error Too many open files while talking to LAPI (http://127.0.0.1:8080/v1/decisions?ip=88.122.239.240), client: 88.122.239.240, server: www.*, request: "GET /wp-content/plugins/woocommerce/assets/css/woocommerce-smallscreen.css?ver=3.9.3 HTTP/1.1", host: "www.toto.com", referrer: "https://www.toto.com/velo-collection/"
my_log_file.log.2.gz:2021/08/24 18:55:58 [error] 7744#7744: *1378633 [lua] CrowdSec.lua:103: allowIp(): [Crowdsec] Http error Too many open files while talking to LAPI (http://127.0.0.1:8080/v1/decisions?ip=88.122.239.240), client: 88.122.239.240, server: www.*, request: "GET /wp-content/plugins/woocommerce/assets/css/woocommerce-smallscreen.css?ver=3.9.3 HTTP/1.1", host: "www.toto.com", referrer: "https://www.toto.com/velo-collection/"

Hi again
We got again those too many open files

# grep -ri crowdsec *.log|grep "Too many open"|wc -l
2457

Do you think max open files is reached on nginx side, or is it on crowdsec side ?

Hello,

I never encountered this issue, would you be able to provide a bit more context on “how” it happens ? If we’re able to reproduce it on our side it might be a lot easier. In the worst case, can you jump onto crowdsec-project/community - Gitter one of this days so we can try to debug this together ?

edit: at second thought it seems the FD limit might be reached on nginx side. In the logs you pasted the IPs that appear during your error (on nginx side : too many fd) do not appear in the local API log, making it look like the request never leaved nginx.

edit2: in your nginx log, something is awkward - the logs are repeated twice for each IP, do you have a specific setup ?

Thanks,

If I remember well, we were hardly attacked on a non watched log file (so maybe Nginx was stacking connections, and was out of fd…)

And for the duplicate, yes this is strange…

I did a “cscli bouncers list” and only 1 bouncer registered on each machine…