CrowdSec Appsec produces plenty of errors in OpenResty logs under high traffic

Dear Community,

Environment

  • OS: Oracle Linux 9
  • OpenResty: v1.29.2.1 (ngx_lua-0.10.29R2, lua-resty-http v0.12)
  • CrowdSec: v1.7.6-rpm-pragmatic-amd64-eacc8192
  • CrowdSec OpenResty Bouncer: v1.1.1
  • AppSec: enabled, crowdsecurity/virtual-patching + OWASP CRS v4.0.0 - out-of-band
  • Traffic: ~4.5 million requests/week

Description

Under sustained high traffic, the OpenResty bouncer produces a large volume of AppSec timeout errors in the nginx error log, despite the AppSec engine itself processing requests well within the configured timeout thresholds.

grep -a "\[error\]" /path/to/nginx/error/log | sed -E 's/^.*\*[0-9]+ //' | cut -d',' -f1 | sort | sed 's/\([0-9]\{1,3\}\.\)\{3\}[0-9]\{1,3\}/<IP>/g' | uniq -c | sort -nr
   3897 lua tcp socket read timed out
   3897 [lua] crowdsec.lua:622: Allow(): AppSec check: timeout
   3897 [lua] crowdsec.lua:541: AppSecCheck(): Fallback because of err: timeout
     75 connect() failed (111: Connection refused)
     72 [lua] crowdsec.lua:622: Allow(): AppSec check: connection refused
     72 [lua] crowdsec.lua:541: AppSecCheck(): Fallback because of err: connection refused
      3 [lua] live.lua:39: live_query(): failed to query LAPI http://<IP>:8080/v1/decisions?ip=<IP>: connection refused
      3 [lua] crowdsec.lua:608: Allow(): [Crowdsec] bouncer error: request failed: connection refused

The same bouncer configuration on a low-traffic site (~5,000 requests/week) produces zero errors.

Evidence that AppSec engine latency is not the cause

Prometheus metrics from http://127.0.0.1:6060/metrics confirm the AppSec engine is processing requests well within timeout limits:

# APPSEC_PROCESS_TIMEOUT = 1000ms (default)

cs_appsec_inband_parsing_time_seconds_bucket{le="0.001"} = 2,956,008  # 99.75% of requests under 1ms
cs_appsec_inband_parsing_time_seconds_bucket{le="0.0025"} = 2,962,535  # 99.97% of requests under 2.5ms
cs_appsec_inband_parsing_time_seconds_count = 2,963,517
cs_appsec_inband_parsing_time_seconds_sum = 591s
# → average inband latency: ~0.2ms

The AppSec engine average inband latency is ~0.2ms against a 1000ms timeout. Rule evaluation time is not the bottleneck.

Steps to reproduce

  1. Deploy cs-openresty-bouncer with AppSec enabled
  2. Generate sustained traffic of ~80+ requests/second to the proxied application
  3. Observe AppSec check: timeout errors in the nginx error log, and as a result they are not processed by WAF

Expected behavior

~80+ requests/second are processed as they should be, without producing errors.

Additional context

  • Bouncer config: all timeout values at defaults (APPSEC_CONNECT_TIMEOUT=100, APPSEC_SEND_TIMEOUT=100, APPSEC_PROCESS_TIMEOUT=1000)
  • Increasing worker_connections in nginx.conf did not resolve the issue

Could you please help to identify the root of the problem and fix?