How to do a SIMPLE log read and IP ban?

I am trying to read through the documentation on creating custom parsers and scenarios but I am getting incredibly lost.

What I am trying to do is extremely simple and I think the documentation and yaml configs are overly complicated for my needs, so I need some help creating the absolute barebones configs for my idea.

All I want to do is read /var/log/nginx/access.log, grok out every remote address IP and ban it for a week. (The log file is only for the default server block, so it only ever contains bots)
Then I want to deliver all these IP addresses into my nftables bouncer.

But I am getting confused with all these stages and statics and files.

This is what I have so far:

/etc/crowdsec/hub/custom/config.yaml

parsers:
  - ./parsers/s01-parse/nginx-ip.yaml
scenarios:
  - ./scenarios/nginx-ban.yaml

/etc/crowdsec/parsers/s01-parse/nginx-ip.yaml

filter: 1 == 1
debug: true
onsuccess: continue
name: nginx-ip
description: "Parse nginx logs"
grok:
  pattern: ^%{IP:ip}'
  apply_on: message
statics:
  - parsed: ip
    value: yes

/etc/crowdsec/scenarios/nginx-ban.yaml

type: trigger
name: nginx-ban
description: "Ban IP addresses from Nginx default server logs"
filter: "evt.Parsed.ip"
labels:
  service: nginx
  remediation: true

When testing the log file with explain, I get Line 0/1 is missing evt.StrTime. Why do I need a time for this? I just want to ban the IP, why does it need a timestamp for that? Is there a way to manually set the StrTime to the current system time upon parsing?

So the grok pattern expects the IP to be the first parseable item on the log line for example

192.168.1.1 {rest of data}

So the grok pattern will automatically already set parsed: ip: <value> from the grok pattern itself so setting the following

statics:
  - parsed: ip
    value: yes

Will replace the data the is parsed from the grok with the actual value of β€œyes”. Since the grok already sets the parsed attribute you dont need to set this static. We do however, promote you to set the following

- meta: source_ip
  expression: evt.Parsed.ip

As this is used by the following s02 nodes as the remote ip address. If you use the default nginx log format you can already use the nginx parser we provide and simply just use a custom scenario

type: trigger
name: nginx-ban
description: "Ban IP addresses from Nginx default server logs"
filter: "evt.Meta.service == 'http' && 'source_ip' in evt.Meta"
group_by: evt.Meta.source_ip
labels:
  service: nginx
  remediation: true

but beaware the trigger bucket will issue bans for any log line that is not previously whitelisted, so if you can provide context why you are trying to achieve this then we can alter the scenario to prevent false positives.

I did previously try to use the NGINX collection, but I when I try to use the cscli explain it comes back with the same error, Line 0/1 is missing evt.StrTime. I did verify that my NGINX log format works with the NGINX Grok pattern, but thats another issue.

For my purposes, I do want to issue bans for every line in the log file. My NGINX server has default server blocks that do not serve the main domain and only bots will visit them. This server block is the only one that sends logs to the access.log file. So the access.log file only has malicious IPs in it, hence why I want to ban any IP that shows up there.

The cscli explain seems to complain that I dont have a evt.StrTime, should that be set somewhere using meta/parsed in the parser yaml?

Just so I fully understand I guess each domain is logging to a separate file? so each line in the default /var/log/nginx/access.log is what you want?

cscli explain complains but its not an error its only to warn that time machine wont work as intended which doesnt matter in your case.

I would be more inclined to debug why the default nginx parser is/isnt working than try to reinvent the wheel with a custom parser. Then we can specify in the custom scenario that it should only trigger on the default log path of nginx since we expose evt.Meta.datasource_path which should be the file path.

β”œ s00-raw
|       β”” 🟒 crowdsecurity/non-syslog (+5 ~8)
|               β”” update evt.ExpectMode : %!s(int=0) -> 1
|               β”” update evt.Stage :  -> s01-parse
|               β”” update evt.Line.Raw :  -> 111.222.333.444 - - [11/Mar/2022:07:41:47 +0100] "\x16\x03\x01\x00\xCA\x01\x00\x00\xC6\x03\x03\xC3\xA3\xF6MU\xBAZJ2\xBA\xD3\xCB\xAD\xA9\x92~j\x0E<\x8Cf,\xBB\x9A)\xD4\xAD53\xF3\x04\x0E\x00\x00h\xCC\x14\xCC\x13\xC0/\xC0+\xC00\xC0,\xC0\x11\xC0\x07\xC0'\xC0#\xC0\x13\xC0\x09\xC0(\xC0$\xC0\x14\xC0" 400 157 "-" "-"
|               β”” update evt.Line.Src :  -> /tmp/cscli_explain3707358208/cscli_test_tmp.log
|               β”” update evt.Line.Time : 0001-01-01 00:00:00 +0000 UTC -> 2024-06-14 13:46:13.001368385 +0000 UTC
|               β”” create evt.Line.Labels.type : nginx
|               β”” update evt.Line.Process : %!s(bool=false) -> true
|               β”” update evt.Line.Module :  -> file
|               β”” create evt.Parsed.message : 111.222.333.444 - - [11/Mar/2022:07:41:47 +0100] "\x16\x03\x01\x00\xCA\x01\x00\x00\xC6\x03\x03\xC3\xA3\xF6MU\xBAZJ2\xBA\xD3\xCB\xAD\xA9\x92~j\x0E<\x8Cf,\xBB\x9A)\xD4\xAD53\xF3\x04\x0E\x00\x00h\xCC\x14\xCC\x13\xC0/\xC0+\xC00\xC0,\xC0\x11\xC0\x07\xC0'\xC0#\xC0\x13\xC0\x09\xC0(\xC0$\xC0\x14\xC0" 400 157 "-" "-"
|               β”” create evt.Parsed.program : nginx
|               β”” update evt.Time : 0001-01-01 00:00:00 +0000 UTC -> 2024-06-14 13:46:13.00140271 +0000 UTC
|               β”” create evt.Meta.datasource_path : /tmp/cscli_explain3707358208/cscli_test_tmp.log
|               β”” create evt.Meta.datasource_type : file

My domain logs are remote, I’m not interested in adding them to crowdsec just yet. But the default server block is the one that gets hammered with bots and malformed requests often, so I’d like to get that fed in to make the most difference. Those logs are in /var/log/nginx/access.log.

The problem I was running into, assuming the strtime is not a problem, was the cscli explain on my existing logs just stopped directly at the 01 parser when using the NGINX collection. I guessed that maybe my log formats were different than the ones the log parser was built for, maybe.

Currently with my custom stuff this is the output from explain:

line: 138.197.200.15 - - [14/Jun/2024:02:43:05 -0700] "\x16\x03\x01\x00{\x01\x00\x00w\x03\x039fY\x9E\xD8\xD8\x02\x18v\xEA\x07\xBD\xEC\xEE\xE6F14!2;S\xC4\x02\xA5\xC9\x1B\x96o{j\x10\x00\x00\x1A\xC0/\xC0+\xC0\x11\xC0\x07\xC0\x13\xC0\x09\xC0\x14\xC0" 400 150 "-" "-"
        β”œ s01-parse
        |       β”” πŸ”΄ nginx-ip
        β””-------- parser failure πŸ”΄

If I go ahead and install the NGINX collection, this is the output of the explain:

line: 138.197.200.15 - - [14/Jun/2024:02:43:05 -0700] "\x16\x03\x01\x00{\x01\x00\x00w\x03\x039fY\x9E\xD8\xD8\x02\x18v\xEA\x07\xBD\xEC\xEE\xE6F14!2;S\xC4\x02\xA5\xC9\x1B\x96o{j\x10\x00\x00\x1A\xC0/\xC0+\xC0\x11\xC0\x07\xC0\x13\xC0\x09\xC0\x14\xC0" 400 150 "-" "-"
        β”œ s01-parse
        |       β”œ πŸ”΄ nginx-ip
        |       β”” πŸ”΄ crowdsecurity/nginx-logs
        β””-------- parser failure πŸ”΄

I understand wanting to debug the standard NGINX pipeline that is provided, but it seems either solution is failing at the same issue.

I would appreciate help on the custom scenario though.

You seem to be missing alot of configuration which we install by default, you should at least install the following parsers

s00:
https://app.crowdsec.net/hub/author/crowdsecurity/configurations/syslog-logs
s02:
https://app.crowdsec.net/hub/author/crowdsecurity/configurations/dateparse-enrich
https://app.crowdsec.net/hub/author/crowdsecurity/configurations/geoip-enrich

The naming of syslog-logs is abit confusing as it contains the default file based parser tagged on to the bottom of it (it hard for us to change this now as it been default since inception of crowdsec)

The s00 will get it to parse correctly, you need at least one s02 enricher for it to pass the stages so you can choose either, but we recommend to install both by default

Looks like that was the problem

WARN Line 0/1 is missing evt.StrTime. It is most likely a mistake as it will prevent your logs to be processed in time-machine/forensic mode. 
line: 107.122.81.115 - - [15/Jun/2024:19:48:08 -0700] "GET / HTTP/1.1" 400 150 "-" "-"
        β”œ s00-raw
        |       β”œ πŸ”΄ crowdsecurity/syslog-logs
        |       β”” 🟒 crowdsecurity/non-syslog (+5 ~8)
        β”œ s01-parse
        |       β”” 🟒 crowdsecurity/nginx-logs (+17 ~1)
        β”œ s02-enrich
        |       β”œ πŸ”΄ crowdsecurity/dateparse-enrich
        |       β”œ 🟒 crowdsecurity/geoip-enrich (+13)
        |       β”œ πŸ”΄ crowdsecurity/http-logs
        |       β”” 🟒 crowdsecurity/whitelists (+2)
        β”œ-------- parser success 🟒        β”œ Scenarios
                β”” 🟒 nginx-ban

Now IPs in the access.log are showing up in the ip set

nft list table crowdsec | grep "107.122.81.115"
     107.0.200.227 timeout 6d22h59m56s892ms expires 4d21h30m56s988ms, 107.122.81.115 timeout 3d7h59m54s160ms expires 3d7h59m34s244ms,

Thank you very much for your help!

Remember that the configured scenario triggers for all types, so if you plan in the future to include and additional logs you must also add an additional filtering.