Filebeat and LogStash — data in multiple different formats

Question

I have Filebeat, Logstash, ElasticSearch and Kibana. Filebeat is on a separate server and it's supposed to receive data in different formats: syslog, json, from a database, etc and send it to Logstash.

I know how to setup Logstash to make it handle a single format, but since there are multiple data formats, how would I configure Logstash to handle each data format properly?

In fact, how can I setup them both, Logstash and Filebeat, so that all the data in different formats get sent from Filebeat and submitted to Logstash properly? I mean, the config setting which handle sending and receiving data.

Answer 1

To separate different types of inputs within the Logstash pipeline, use the type field and tags for more identification.

In your Filebeat configuration, you should be using a different prospector for each different data format, each prospector can then be set to have a different document_type: field.

Reference

For example:

filebeat:
  # List of prospectors to fetch data.
  prospectors:
    # Each - is a prospector. Below are the prospector specific configurations
    -
      # Paths that should be crawled and fetched. Glob based paths.
      # For each file found under this path, a harvester is started.
      paths:
        - "/var/log/apache/httpd-*.log"
      # Type to be published in the 'type' field. For Elasticsearch output,
      # the type defines the document type these entries should be stored
      # in. Default: log
      document_type: apache
    -
      paths:
        - /var/log/messages
        - "/var/log/*.log"
      document_type: log_message

In the above example, logs from /var/log/apache/httpd-*.log will have document_type: apache , while the other prospector has document_type: log_message .

This document-type field becomes the type field when Logstash is processing the event. You can then use if statements in Logstash to do different processing on different types.

Reference

For example:

filter {
  if [type] == "apache" {
    # apache specific processing
  }
  else if [type] == "log_message" {
    # log_message processing
  }
}

Answer 2

If the "data formats" in your question are codecs, this has to be configured in the input of logstash. The following is about filebeat 1.x and logstash 2.x, not the elastic 5 stack. In our setup, we have two beats inputs - the first is default = "plain":

beats {
    port                => 5043
}
beats {
    port                => 5044
    codec               => "json"
}

On the filebeat side, we need two filebeat instances, sending their output to their respective ports. It's not possible to tell filebeat "route this prospector to that output".

Documentation logstash: https://www.elastic.co/guide/en/logstash/2.4/plugins-inputs-beats.html

Remark: If you ship with different protocols, eg legacy logstash-forwarder / lumberjack, you need more ports.

Answer 3

Supported with 7.5.1

filebeat-multifile.yml // file beat installed on a machine

filebeat.inputs:
- type: log
  tags: ["gunicorn"]
  paths:
    - "/home/hduser/Data/gunicorn-100.log"

- type: log
  tags: ["apache"]
  paths:
    - "/home/hduser/Data/apache-access-100.log"

output.logstash:
  hosts: ["0.0.0.0:5044"] // target logstash IP

gunicorn-apache-log.conf // log stash installed on another machine

input {
  beats {
    port => "5044"
    host => "0.0.0.0" 
  }
}

filter {
    if "gunicorn" in [tags] {
        grok {
            match => { "message" => "%{USERNAME:u1} %{USERNAME:u2} \[%{HTTPDATE:http_date}\] \"%{DATA:http_verb} %{URIPATHPARAM:api} %{DATA:http_version}\" %{NUMBER:status_code} %{NUMBER:byte} \"%{DATA:external_api}\" \"%{GREEDYDATA:android_client}\"" }
            remove_field => "message"
        }
    }
    else if "apache" in [tags] {
        grok {
            match => { "message" => "%{IPORHOST:client_ip} %{DATA:u1} %{DATA:u2} \[%{HTTPDATE:http_date}\] \"%{WORD:http_method} %{URIPATHPARAM:api} %{DATA:http_version}\" %{NUMBER:status_code} %{NUMBER:byte} \"%{DATA:external_api}\" \"%{GREEDYDATA:gd}\" \"%{DATA:u3}\""}
            remove_field => "message"
        }

    }
}

output {
    if "gunicorn" in [tags]{
        stdout { codec => rubydebug }

        elasticsearch {
        hosts => [...]
        index => "gunicorn-index"
        }

    }
    else if "apache" in [tags]{
        stdout { codec => rubydebug }

        elasticsearch {
             hosts => [...]
             index => "apache-index"
        }
    }
}

Run filebeat from binary Give proper permission to file

sudo chown root:root filebeat-multifile.yml
sudo chmod go-w filebeat-multifile.yml
sudo ./filebeat -e -c filebeat-multifile-1.yml -d "publish"

Run logstash from binary

./bin/logstash -f gunicorn-apache-log.conf

Filebeat and LogStash — data in multiple different formats

Question

3 answers

solution1
4 2016-06-07 16:33:15

solution2
1 2016-11-27 11:03:19

solution3
1 2020-01-07 10:41:33

Filebeat and LogStash — data in multiple different formats

Question

3 answers

solution1 4 2016-06-07 16:33:15

solution2 1 2016-11-27 11:03:19

solution3 1 2020-01-07 10:41:33

solution1
4 2016-06-07 16:33:15

solution2
1 2016-11-27 11:03:19

solution3
1 2020-01-07 10:41:33