Filtering Filebeat input with or without Logstash

Question

In our current setup we use Filebeat to ship logs to an Elasticsearch instance. The application logs are in JSON format and it runs in AWS.

For some reason AWS decided to prefix the log lines in a new platform release, and now the log parsing doesn't work.

Apr 17 06:33:32 ip-172-31-35-113 web: {"@timestamp":"2020-04-17T06:33:32.691Z","@version":"1","message":"Tomcat started on port(s): 5000 (http) with context path ''","logger_name":"org.springframework.boot.web.embedded.tomcat.TomcatWebServer","thread_name":"main","level":"INFO","level_value":20000}

Before it was simply:

{"@timestamp":"2020-04-17T06:33:32.691Z","@version":"1","message":"Tomcat started on port(s): 5000 (http) with context path ''","logger_name":"org.springframework.boot.web.embedded.tomcat.TomcatWebServer","thread_name":"main","level":"INFO","level_value":20000}

The question would be whether we can avoid using Logstash to convert the log lines into the old format? If not, how do I drop the prefix? Which filter is the best choice for this?

My current Filebeat configuration looks like this:

 filebeat.inputs:
  - type: log
    paths:
    - /var/log/web-1.log
    json.keys_under_root: true
    json.ignore_decoding_error: true
    json.overwrite_keys: true
    fields_under_root: true
    fields:
      environment: ${ENV_NAME:not_set}
      app: myapp

  cloud.id: "${ELASTIC_CLOUD_ID:not_set}"
  cloud.auth: "${ELASTIC_CLOUD_AUTH:not_set}"

Answer 1

I would try to leverage the dissect and decode_json_fields processors:

processors:
  # first ignore the preamble and only keep the JSON data
  - dissect:
      tokenizer: "%{?ignore} %{+ignore} %{+ignore} %{+ignore} %{+ignore}: %{json}"
      field: "message"
      target_prefix: ""

  # then parse the JSON data
  - decode_json_fields:
      fields: ["json"]
      process_array: false
      max_depth: 1
      target: ""
      overwrite_keys: false
      add_error_key: true

Answer 2

There is a plugin in Logstash called JSON filter that includes all the raw log line in a field called "message" (for instance).

filter {
    json {
        source => "message"
    }
}

If you do not want to include the beginning part of the line, use the dissect filter in Logstash. It would be something like this:

filter {
    dissect {
        mapping => {
            "message" => "%{}: %{message_without_prefix}"
         }
    }
}

Maybe in Filebeat there are these two features available as well. But in my experience, I prefer working with Logstash when parsing/manipulating logging data.

Filtering Filebeat input with or without Logstash

Question

2 answers

solution1
1 ACCPTED 2020-04-17 09:04:24

solution2
0 2020-04-17 11:31:14

Filtering Filebeat input with or without Logstash

Question

2 answers

solution1 1 ACCPTED 2020-04-17 09:04:24

solution2 0 2020-04-17 11:31:14

solution1
1 ACCPTED 2020-04-17 09:04:24

solution2
0 2020-04-17 11:31:14