Adding exceptions for allowed 404 errors

hi, I’m pretty new to Crowdsec and I’ve hit a problem I can’t find in the documentation.
I’m using crowdsec with haproxy to front some remote admin software installed on PCs at remote locations. This means we can have quite a lot of connections coming from the same IP. While they are different PCs, the source ip is shared.

I want to avoid generating bans under certain circumstances. IP tables blocking is working well but I would like to be able to have a url whitelist of somekind.

The remote admin software can attempt to download files to see if they are present. If they are not this will generate 404 errors and I think crowdsec is catching these 404s and treating it as a problem.

Cscli decisions list reports that a ban is occuring due to crowdsecurity/http-probing

Can I add an exception list for valid urls and for urls which can be queried but which may return 404s?

What is this even called within Crowdsec?

I am in the middle of trying to add a whitelist, but I don’t understand the mechanism of how to get Crowdsec to pull in my new whitelist.yaml file. I assume a reference to my new file has to be added to one of the existing config files, but I’m not yet sure of how to do this.
Does a whitelist help with ignoring urls which return 404 errors?

Regards
Ian

Hello, yes you can have a whitelist on specific URLs.
Here is a troubleshooting documentation about this: Troubleshooting Guide | CrowdSec

For your information, the status code is stored or in evt.Parsed.status or evt.Meta.http_status.
Also you can see all the parsed field with cscli explain: cscli explain | CrowdSec

but I don’t understand the mechanism of how to get Crowdsec to pull in my new whitelist.yaml file. I assume a reference to my new file has to be added to one of the existing config files, but I’m not yet sure of how to do this.

Can you please show what you did ?

All I did was create a file /etc/crowdsec/parsers/s02-enrich/whitelist.yaml
with this as content

name: downloads whitelist

whitelist:
        reason: do not ban agents searching for files
        expression:
        - "'/downloads/' in haproxy.log"

I couldn’t find an explanation of what the ‘in xxxx’ should be referencing. Its a source, but I’m not sure how I know what the ‘reference’ is for the source, so that last line is probably wrong.

If I type cscli metrics, I can see that I have an haproxy-logs parser, which is active:

+---------------------------------+--------+--------+----------+
|             PARSERS             |  HITS  | PARSED | UNPARSED |
+---------------------------------+--------+--------+----------+
| child-crowdsecurity/http-logs   | 18.16M | 18.02M | 147.35k  |
| child-crowdsecurity/syslog-logs | 11.90k | 11.90k | -        |
| crowdsecurity/dateparse-enrich  | 6.05M  | 6.05M  | -        |
| crowdsecurity/geoip-enrich      | 6.05M  | 6.05M  | -        |
| crowdsecurity/haproxy-logs      | 6.06M  | 6.05M  | 1.75k    |
| crowdsecurity/http-logs         | 6.05M  | 6.05M  | 218      |
| crowdsecurity/non-syslog        | 6.06M  | 6.06M  | -        |
| crowdsecurity/syslog-logs       | 11.90k | 11.90k | -        |
+---------------------------------+--------+--------+----------+

But my new whitelist.yaml file isn’t linked to anywhere. I assume I have to make an entry in a config file to ‘include’ whitelist.yaml… maybe not?

so I have two questions really,

  • In the whitelist file, the expression section requires that you specify a path and a source - how do I work out the name of the source? Is there a cscli command which will display it?
  • How do I get crowdsec to ‘include’ my new whitelist.yaml file?

Hello,

Your whitelist expression will apply on the parsed event (actually the log line) directly and not on the source. By running cscli explain you can see all the parsed field for a specific log.
For example for haproxy-logs parser, you can find the HTTP request path in evt.Parsed.request. (you can see this directly in the parser configuration: CrowdSec Hub).

So for your example, it can be something like:

name: whitelist_failed_download
description: "Whitelist failed download"
whitelist:
  reason: "404 download trigger FP"
  expression:
    - "evt.Meta.http_status == '404' and '/downloads/' in evt.Parsed.request" 

For the inclusion in the parsers, i think it is because of the name of your whitelist.
This is not really user friendly, but i think your parser name whitelist_failed_download (as i put in my example) should match your parser filename. Can you try the rename the file whitelist_failed_download.yaml and name it whitelist_failed_download please?

Excellent, at least the service restarts now! I’ll have to wait a bit and see if the whitelist is working.

Since I didn’t have to link the file in from another config file, I assume that CrowdSec simply loads in all the files which it finds under /etc/crowdsec.

I had found an example of whitelisting somewhere on the site and it used a syntax of ‘in file.log’ and I tried copying that, without understanding what the syntax was.

I don’t know if you are part of the site maintainers but if you are it would be helpful if the docs contained an example file showing the full syntax but with links in each field to explanations of the possible values. It would be also handy if the beginning of the getting started indicated that you do most of the configuration using cscli and not editing config files.

Anyway, its looking much better. Now when I restart the service, the log reports
level=warning msg="Deprecation warning: the pid_dir config can be safely removed and is not required"

The only mention I can find of pid_dir is this:

where it says

pid_dir
> string

Folder to store PID file.

kind of strange that this would have started appearing after adding that whitelist file, but it isn’t going to affect things at the moment and I can at least start playing with a working system.

So, thank you for the help.

Aagh, I have to ask you another related question. I’ve just tried to add a second path to the same file and it doesn’t work.
I followed this example:
Format | CrowdSec
where it clearly shows you can have two expressions

 expression:
  #beware, this one will work *only* if you enabled the reverse dns (crowdsecurity/rdns) enrichment postoverflow parser
    - evt.Enriched.reverse_dns endsWith ".mycoolorg.com."
  #this one will work *only* if you enabled the geoip (crowdsecurity/geoip-enrich) enrichment parser
    - evt.Enriched.IsoCode == 'FR'

but the following config generates an error:

name: whitelist_failed_download
description: "Whitelist failed download"

whitelist:
        reason: "404 download trigger FP"
        expression:
        - "evt.Meta.http_status == '404' and '/downloads/' in evt.Parsed.request"
        - "evt.Meta.http_status == '404' and '/updates/' in evt.Parsed.request"

it may be failing due to having evt.Meta.http_status twice rather than using an ‘or’ syntax, but I can’t find any specification for what is a valid expression.

This is getting weird. As I couldn’t get the syntax right, I went back to what you posted above:

name: whitelist_failed_download
description: "Whitelist failed download"
whitelist:
   reason: "404 download trigger FP"
   expression:
    - "evt.Meta.http_status == '404' and '/downloads/' in evt.Parsed.request"

but when I restart the service it fails and the log shows this:

time="04-05-2022 13:14:06" level=fatal msg="Unable to compile whitelist expression 'evt.Meta.http_status == '404' and '/downloads/' in evt.Parsed.request' : invalid operation: in (mismatched types string and string) (1:49)\n | evt.Meta.http_status == '404' and '/downloads/' in evt.Parsed.request\n | ................................................^." id=autumn-rain name=whitelist_failed_download stage=s02-enrich

this exact same file loaded correctly less than 30 minutes ago!

Hello,

For the pid_dir warning, i think that you have a rest of this parameter in your /etc/crowdsec/config.yaml file. You can remove it safely, but this warning doesn’t prevent crowdsec from working.

For your enricher problem, does this whitelist works for your case?

name: whitelist_failed_download
description: "Whitelist failed download"

whitelist:
        reason: "404 download trigger FP"
        expression:
        - "evt.Meta.http_status == '404' and ( '/downloads/' in evt.Parsed.request or '/updates/' in evt.Parsed.request )"

Nope, I keep on getting syntax errors. For the life of me I can’t see any problem with what you’ve posted. I’ve been trying loads of variations on the above and they all result in syntax errors as well. I’ve also tried removing everything and retyping, just in case theres a copy/paste issue with some symbol which looks the same but is actually a different character… but always the same result.


time="06-05-2022 11:27:05" level=fatal msg="Unable to compile whitelist expression 'evt.Meta.http_status == '404' and ('/downloads/' in evt.Parsed.request or '/updates/' in evt.Parsed.request)' : invalid operation: in (mismatched types string and string) (1:50)\n | evt.Meta.http_status == '404' and ('/downloads/' in evt.Parsed.request or '/updates/' in evt.Parsed.request)\n | .................................................^." id=rough-feather name=whitelist_failed_download stage=s02-enrich

evt.Meta.http_status=='404' by itself works fine
'/downloads/' in evt.Parsed.request results in the mismatched type error

however if I change the expression to

evt.Parsed.request contains '/downloads/'

it seems to accept it as being valid.

It also accepts

"evt.Meta.http_status == '404' and (evt.Parsed.request contains '/downloads/' or evt.Parsed.request contains '/updates/')"

what I can’t be sure about is if the contains operator is going to do substring search or something else. Are you sure that evt.Parsed.request contains the requested url? I would have expected that .request would be an object with methods. To get the called url I would have to use evt.Parsed.request.url or something similar.
Unfortunately there is almost no documentation on the evt.Parsed object… that I can find.

Any ideas?

If the site maintainer is reading this then if you open this page: Event | CrowdSec
and search for ‘parsed’, it has a single occurrence. This should really link to a page which lists the methods for the ‘parsed’ object. I don’t think its documented anywhere on the site.

Hello,

I’m sorry, the in keyword is used to check if a string is in list. Indeed, you must use the contains keyword to check if a string contains a subtring.
For the evt.Parsed object, i suggested you to use the cscli explain --verbose command which will show you the output fields from the parser you have installed.
Note: We can’t have a documentation about what fields the evt.Parsed object contains since it depends on the parser that you have installed.

oooh, very nice.
If I understand correctly, this command uses reflection to display what options are available?
I can’t get the syntax right yet, but I like the idea.

Yes.
You provide it a log line (or a file) and the type of logs and it will output the parsed fields.
For example, if you use the nginx-logs parser you can do:

cscli explain --verbose --log '192.1.1.1 - - [29/Sep/2021:15:49:49 +0200] "GET /downloads/ HTTP/1.1" 404 162 "-" "curl/7.68.0"' --type nginx

And this will display something like: