NGiNX Logs in GELF JSON Format

Hi All,

Love the idea of this security solution, and we are actively trialling it in non-production currently.

We are currently hamstrung from contributing our log files to the upstream analysis and analytics, because our access+error logs in NGiNX are in GELF-JSON format, for ingestion into Graylog2.

Has anyone solved this, or a similar problem? We have considered the following approaches -

  1. Write a 2nd log for each vHost in the default “combined” format. There doesn’t seem to be any simple way of doing this, however.
  2. Write a custom GROK pattern filter to override the default “combined” interpretation. Are there any pre existing filters for the GELF JSON format already? Or guides for newbies writing GROK filters?

Example GELF JSON format logs, from nginx.conf -
log_format gelf_json escape=json '{ "timestamp": "$time_iso8601", ' '"remote_addr": "$remote_addr", ' '"connection": "$connection", ' '"connection_requests": $connection_requests, ' '"pipe": "$pipe", ' '"body_bytes_sent": $body_bytes_sent, ' '"request_length": $request_length, ' '"request_time": $request_time, ' '"response_status": $status, ' '"request": "$request", ' '"request_method": "$request_method", ' '"host": "$host", ' '"upstream_cache_status": "$upstream_cache_status", ' '"upstream_addr": "$upstream_addr", ' '"http_x_forwarded_for": "$http_x_forwarded_for", ' '"http_referrer": "$http_referer", ' '"http_user_agent": "$http_user_agent", ' '"http_version": "$server_protocol", ' '"remote_user": "$remote_user", ' '"http_x_forwarded_proto": "$http_x_forwarded_proto", ' '"upstream_response_time": "$upstream_response_time", ' '"nginx_access": true }';

Hi,

I believe the produced logs are JSON, aren’t they ?

So we have no built-in parser that can be fed with json logs for now, but we have already some features to do it. You’ll find an example in the unit tests https://github.com/crowdsecurity/crowdsec/tree/master/pkg/parser/tests/base-json-extract

I guess the resulting parser file should look like this:

filter: "evt.Parsed.program startsWith 'nginx'"
onsuccess: next_stage
#debug: true
name: crowdsecurity/nginx-logs
description: "Parse nginx access and error logs"
statics:
  - target: evt.StrTime
    expression: JsonExtract(evt.Line.Raw, "timestamp8601")
  - parsed: "logsource"
    value: "gelf-nginx"
  - meta: source_ip
    expression: JsonExtract(evt.Line.Raw, "remote_addr")
  - meta: http_status
    expression: JsonExtract(evt.Line.Raw, "response_status")
  - meta: http_path
    expression: JsonExtract(evt.Line.Raw, "request")
  - meta: log_type
    value: http_access-log

This file should be put in the /etc/crowdsec/config/parsers/s00-raw directory but I can’t test it because I don’t have any of your logs. If you want to, you can provide us a sample of your logs and we’ll have a better chance to provide you a ready to use parser file.

Hi Kaa,

Thanks very much for the prompt and informative reply.

Please find an example log file here: https://www.veepshosting.com/ssl-redacted.domain.co.access.log.gz

I’m getting conflicting feedback on whether or not it’s working.

The primary log file ‘/var/log/crowdsec.log’ has the following entry, using the parser above -
time=“04-11-2020 18:06:01” level=debug msg="+ Processing 2 statics" func=“github.com/crowdsecurity/crowdsec/pkg/parser.(*Node).process” file="/home/runner/work/crowdsec/crowdsec/pkg/parser/node.go:313" id=long-lake name=crowdsecurity/non-syslog stage=s00-raw
time=“04-11-2020 18:06:01” level=debug msg=".Parsed[message] = ‘{ “timestamp”: “2020-11-04T18:06:01+11:00”, “remote_addr”: “54.162.224.1”, “connection”: “50343”, “connection_requests”: 1, “pipe”: “.”, “body_bytes_sent”: 161205, “request_length”: 325, “request_time”: 0.228, “response_status”: 200, “request”: “GET /media/catalog/product/b/e/bella_rosa.jpg HTTP/1.1”, “request_method”: “GET”, “host”: “redacted.domain.co”, “upstream_cache_status”: “”, “upstream_addr”: “”, “http_x_forwarded_for”: “”, “http_referrer”: “”, “http_user_agent”: “Ruby”, “http_version”: “HTTP/1.1”, “remote_user”: “”, “http_x_forwarded_proto”: “”, “upstream_response_time”: “”, “nginx_access”: true }’" func=github.com/crowdsecurity/crowdsec/pkg/parser.ProcessStatics file="/home/runner/work/crowdsec/crowdsec/pkg/parser/runtime.go:175" id=long-lake name=crowdsecurity/non-syslog stage=s00-raw

Yes the metrics command doesn’t show any lines parsed? -

sudo cscli metrics

INFO[0000] Buckets Metrics:
±-------±--------------±----------±-------------±-------±--------+
| BUCKET | CURRENT COUNT | OVERFLOWS | INSTANCIATED | POURED | EXPIRED |
±-------±--------------±----------±-------------±-------±--------+
±-------±--------------±----------±-------------±-------±--------+
INFO[0000] Acquisition Metrics:
±----------------------------------------------------------±-----------±-------------±---------------±-----------------------+
| SOURCE | LINES READ | LINES PARSED | LINES UNPARSED | LINES POURED TO BUCKET |
±----------------------------------------------------------±-----------±-------------±---------------±-----------------------+
| /var/log/auth.log | 24 | - | 24 | - |
| /var/log/nginx/ssl-obfuscated.domain.co.access.log | 132 | - | 132 | - |
| /var/log/syslog | 11 | - | 11 | - |
±----------------------------------------------------------±-----------±-------------±---------------±-----------------------+
INFO[0000] Parser Metrics:
±------------------------------±-----±-------±---------+
| PARSERS | HITS | PARSED | UNPARSED |
±------------------------------±-----±-------±---------+
| child-crowdsecurity/sshd-logs | 10 | - | 10 |
| crowdsecurity/non-syslog | 132 | 132 | - |
| crowdsecurity/sshd-logs | 2 | - | 2 |
| crowdsecurity/syslog-logs | 35 | 35 | - |
±------------------------------±-----±-------±---------+

The nginx-gelp thingy gets only unparsed logs. So this is not working?

OTOH the installation seems functional because syslog did parse 35 log lines. But this parser has to be followed by another one (and nginx-gelp thingy is not eligible). Your configuration seems that beside your nginx-gelp thingy only ssh is enabled thus this one only is able to trigger anything, and it was fed only by two lines of log. This seems actually legit.

I’ll take some time to dig into your logs very soon.

Ok I got the parsing to work.

  1. You’ll have to configure your /etc/crowdsec/config/acquis.yaml with something like
filenames:
  - /var/log/<the generated gelp-nginx log file>
labels:
  type: gelp-nginx
---
  1. Then add the folllowing configuration as parsing file to add in /etc/crowdsec/config/parsers/s00-raw. Whatever name with the extension .yaml will do.
filter: "evt.Line.Labels.type == 'nginx-gelp'"
onsuccess: next_stage
#debug: true
name: crowdsecurity/nginx-logs
description: "Parse nginx access and error logs"
statics:
  - target: evt.StrTime
    expression: JsonExtract(evt.Line.Raw, "timestamp")
  - parsed: "logsource"
    value: "gelf-nginx"
  - parsed: remote_addr
    expression: JsonExtract(evt.Line.Raw, "remote_addr")   
  - parsed: remote_user
    expression: JsonExtract(evt.Line.Raw, "remote_user")   
  - meta: source_ip
    expression: JsonExtract(evt.Line.Raw, "remote_addr")
  - meta: http_status
    expression: JsonExtract(evt.Line.Raw, "response_status")
  - meta: http_path
    expression: JsonExtract(evt.Line.Raw, "request")
  - meta: log_type
    value: http_access-log
  - meta: service
    value: http
  - parsed: http_user_agent
    expression: JsonExtract(evt.Line.Raw, "http_user_agent")
  - parsed: http_referer
    expression: JsonExtract(evt.Line.Raw, "http_referrer")
  - parsed: target_fqdn
    expression: JsonExtract(evt.Line.Raw, "host")
  - parsed: method
    expression: JsonExtract(evt.Line.Raw, "request_method")
  - parsed: body_bytes_sent
    expression: JsonExtract(evt.Line.Raw, "body_bytes_sent")
  - parsed: http_version
    expression: JsonExtract(evt.Line.Raw, "http_version")   
  - parsed: status
    expression: JsonExtract(evt.Line.Raw, "response_status")

Please keep in mind that the expression "evt.Line.Labels.type == 'nginx-gelp'" as to match the label in the acquis.yaml` file.

  1. A final step is required to make all the dot connect with the http-related scenarios. Add the following file in /etc/crowdsec/config/parsers/s01-parse. Whatever name with .yaml extension will do:
filter: "evt.Meta.service == 'http' && evt.Meta.log_type in ['http_access-log', 'http_error-log']"
onsuccess: next_stage
name: local/gelp-nginx-request
nodes:
  - grok:
      pattern: '%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http_version}'
      apply_on: full_request
1 Like

Thanks Kaa, we are testing it currently, however the log files that match are still sitting in “Unparsed”. No errors relating to the above config are listen in the main crowdsec.log log file, so possibly just a matter of waiting a bit longer.

cscli metrics
INFO[0000] Buckets Metrics:
±-------------------------------±--------------±----------±-------------±-------±--------+
| BUCKET | CURRENT COUNT | OVERFLOWS | INSTANCIATED | POURED | EXPIRED |
±-------------------------------±--------------±----------±-------------±-------±--------+
| crowdsecurity/ssh-bf | - | - | 12 | 13 | 12 |
| crowdsecurity/ssh-bf_user-enum | - | - | 12 | 12 | 12 |
±-------------------------------±--------------±----------±-------------±-------±--------+
INFO[0000] Acquisition Metrics:
±-------------------------------------------------------------±-----------±-------------±---------------±-----------------------+
| SOURCE | LINES READ | LINES PARSED | LINES UNPARSED | LINES POURED TO BUCKET |
±-------------------------------------------------------------±-----------±-------------±---------------±-----------------------+
| /var/log/auth.log | 877 | 13 | 864 | 25 |
| /var/log/nginx/ssl-server.access.log | 1 | - | 1 | - |
| /var/log/nginx/ssl-server.error.log | 1 | - | 1 | - |
| /var/log/nginx/ssl-preprod.site.co.access.log | 1 | - | 1 | - |
| /var/log/nginx/ssl-test.site.co.access.log | 1814 | - | 1814 | - |
| /var/log/syslog | 517 | - | 517 | - |
±-------------------------------------------------------------±-----------±-------------±---------------±-----------------------+
INFO[0000] Parser Metrics:
±-------------------------------±-----±-------±---------+
| PARSERS | HITS | PARSED | UNPARSED |
±-------------------------------±-----±-------±---------+
| child-crowdsecurity/sshd-logs | 836 | 13 | 823 |
| child-local/gelf-nginx | 526 | - | 526 |
| crowdsecurity/dateparse-enrich | 13 | 13 | - |
| crowdsecurity/geoip-enrich | 13 | 13 | - |
| crowdsecurity/nginx-logs | 526 | 526 | - |
| crowdsecurity/non-syslog | 1291 | 1291 | - |
| crowdsecurity/sshd-logs | 171 | 13 | 158 |
| crowdsecurity/syslog-logs | 1394 | 1394 | - |
| crowdsecurity/whitelists | 13 | 13 | - |
| local/gelf-nginx | 526 | - | 526 |
±-------------------------------±-----±-------±---------+

Hi @grant-veepshosting,

You definitely should get more count in the parsed column.

But when I read the config I pasted you I found a nasty typo: type: gelp-nginx in the 1. should match the nginx-gelp in the filter lin in the 2. These two entries should match in order the parser to know which type of log it’s working on.

Replacing you /etc/crowdsec/config/acquis.yaml with

filenames:
  - /var/log/<the generated gelp-nginx log file>
labels:
  type: nginx-gelp

should do the trick

Sorry for this

Hey Kaa,

I noticed this typo and corrected it yesterday, thanks for following up. Also gelp should be gelf.

According to the debug logging, it’s being processed, but not according to any cscli metrics / statistics.

Example debug log entry, anonymised -

https://www.veepshosting.com/nginx_gelf_debug_example.log.gz

hi @grant-veepshosting,
Yes, indeed. But it stil lacks a step. The s00-raw step is working fine. But, the s01-parse step isn’t.
I guess a typo still got through my testing because the apply_on directive on the last file doesn’t match anything coming from the s00-raw stage. To fix this you’ll have to add the following line of the file in the s00-parse stage:

  - parsed: full_request
    expression: JsonExtract(evt.Line.Raw, "request")

Furthermore, from the last cscli metrics output you showed us that the base-http-scenario collection is not installed. I wrote the whole gelf (sorry for me being confused with the p over the f) for being compatible with this collection.

You can do cscli install collection crowdsecurity/base-http-scenarios to install it.

Thank you kaa!!! It works not, thanks very much for sticking in there.

Would you like me to post the configs in full for reference, or possible addition to the code base?

PS: I had to add the section above “request” to the s00-raw/yaml file, not the s01-parse/.yaml file for it to work correctly.

HI @grant-veepshosting,

yes, it would be great to have this. At some point, we may want to add this to the official stuff in the hub.

Thanks for your feedback !

/etc/crowdsec/config/parsers/s00-raw/nginx-gelf.yaml:
filter: "evt.Line.Labels.type == 'gelf-nginx'"
onsuccess: next_stage
debug: true
name: crowdsecurity/nginx-logs
description: "Parse nginx access and error logs"
statics:
- target: evt.StrTime
expression: JsonExtract(evt.Line.Raw, "timestamp")
- parsed: "logsource"
value: "gelf-nginx"
- parsed: remote_addr
expression: JsonExtract(evt.Line.Raw, "remote_addr")
- parsed: remote_user
expression: JsonExtract(evt.Line.Raw, "remote_user")
- meta: source_ip
expression: JsonExtract(evt.Line.Raw, "remote_addr")
- meta: http_status
expression: JsonExtract(evt.Line.Raw, "response_status")
- meta: http_path
expression: JsonExtract(evt.Line.Raw, "request")
- meta: log_type
value: http_access-log
- meta: service
value: http
- parsed: http_user_agent
expression: JsonExtract(evt.Line.Raw, "http_user_agent")
- parsed: http_referer
expression: JsonExtract(evt.Line.Raw, "http_referrer")
- parsed: target_fqdn
expression: JsonExtract(evt.Line.Raw, "host")
- parsed: method
expression: JsonExtract(evt.Line.Raw, "request_method")
- parsed: body_bytes_sent
expression: JsonExtract(evt.Line.Raw, "body_bytes_sent")
- parsed: http_version
expression: JsonExtract(evt.Line.Raw, "http_version")
- parsed: status
expression: JsonExtract(evt.Line.Raw, "response_status")
- parsed: full_request
expression: JsonExtract(evt.Line.Raw, "request")

/etc/crowdsec/config/parsers/s01-parse/nginx-gelf-logs.yaml:
filter: "evt.Meta.service == 'http' && evt.Meta.log_type in ['http_access-log', 'http_error-log']"
onsuccess: next_stage
name: local/gelf-nginx
nodes:
- grok:
pattern: '%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http_version}'
apply_on: full_request