Dear Community,
Environment
- OS: Oracle Linux 9
- OpenResty: v1.29.2.1 (ngx_lua-0.10.29R2, lua-resty-http v0.12)
- CrowdSec: v1.7.6-rpm-pragmatic-amd64-eacc8192
- CrowdSec OpenResty Bouncer: v1.1.1
- AppSec: enabled, crowdsecurity/virtual-patching + OWASP CRS v4.0.0 - out-of-band
- Traffic: ~4.5 million requests/week
Description
Under sustained high traffic, the OpenResty bouncer produces a large volume of AppSec timeout errors in the nginx error log, despite the AppSec engine itself processing requests well within the configured timeout thresholds.
grep -a "\[error\]" /path/to/nginx/error/log | sed -E 's/^.*\*[0-9]+ //' | cut -d',' -f1 | sort | sed 's/\([0-9]\{1,3\}\.\)\{3\}[0-9]\{1,3\}/<IP>/g' | uniq -c | sort -nr
3897 lua tcp socket read timed out
3897 [lua] crowdsec.lua:622: Allow(): AppSec check: timeout
3897 [lua] crowdsec.lua:541: AppSecCheck(): Fallback because of err: timeout
75 connect() failed (111: Connection refused)
72 [lua] crowdsec.lua:622: Allow(): AppSec check: connection refused
72 [lua] crowdsec.lua:541: AppSecCheck(): Fallback because of err: connection refused
3 [lua] live.lua:39: live_query(): failed to query LAPI http://<IP>:8080/v1/decisions?ip=<IP>: connection refused
3 [lua] crowdsec.lua:608: Allow(): [Crowdsec] bouncer error: request failed: connection refused
The same bouncer configuration on a low-traffic site (~5,000 requests/week) produces zero errors.
Evidence that AppSec engine latency is not the cause
Prometheus metrics from http://127.0.0.1:6060/metrics confirm the AppSec engine is processing requests well within timeout limits:
# APPSEC_PROCESS_TIMEOUT = 1000ms (default)
cs_appsec_inband_parsing_time_seconds_bucket{le="0.001"} = 2,956,008 # 99.75% of requests under 1ms
cs_appsec_inband_parsing_time_seconds_bucket{le="0.0025"} = 2,962,535 # 99.97% of requests under 2.5ms
cs_appsec_inband_parsing_time_seconds_count = 2,963,517
cs_appsec_inband_parsing_time_seconds_sum = 591s
# → average inband latency: ~0.2ms
The AppSec engine average inband latency is ~0.2ms against a 1000ms timeout. Rule evaluation time is not the bottleneck.
Steps to reproduce
- Deploy cs-openresty-bouncer with AppSec enabled
- Generate sustained traffic of ~80+ requests/second to the proxied application
- Observe
AppSec check: timeouterrors in the nginx error log, and as a result they are not processed by WAF
Expected behavior
~80+ requests/second are processed as they should be, without producing errors.
Additional context
- Bouncer config: all timeout values at defaults (
APPSEC_CONNECT_TIMEOUT=100,APPSEC_SEND_TIMEOUT=100,APPSEC_PROCESS_TIMEOUT=1000) - Increasing
worker_connectionsin nginx.conf did not resolve the issue
Could you please help to identify the root of the problem and fix?