简体   繁体   中英

Filebeat and LogStash — data in multiple different formats

I have Filebeat, Logstash, ElasticSearch and Kibana. Filebeat is on a separate server and it's supposed to receive data in different formats: syslog, json, from a database, etc and send it to Logstash.

I know how to setup Logstash to make it handle a single format, but since there are multiple data formats, how would I configure Logstash to handle each data format properly?

In fact, how can I setup them both, Logstash and Filebeat, so that all the data in different formats get sent from Filebeat and submitted to Logstash properly? I mean, the config setting which handle sending and receiving data.

To separate different types of inputs within the Logstash pipeline, use the type field and tags for more identification.

In your Filebeat configuration, you should be using a different prospector for each different data format, each prospector can then be set to have a different document_type: field.

Reference

For example:

filebeat:
  # List of prospectors to fetch data.
  prospectors:
    # Each - is a prospector. Below are the prospector specific configurations
    -
      # Paths that should be crawled and fetched. Glob based paths.
      # For each file found under this path, a harvester is started.
      paths:
        - "/var/log/apache/httpd-*.log"
      # Type to be published in the 'type' field. For Elasticsearch output,
      # the type defines the document type these entries should be stored
      # in. Default: log
      document_type: apache
    -
      paths:
        - /var/log/messages
        - "/var/log/*.log"
      document_type: log_message

In the above example, logs from /var/log/apache/httpd-*.log will have document_type: apache , while the other prospector has document_type: log_message .

This document-type field becomes the type field when Logstash is processing the event. You can then use if statements in Logstash to do different processing on different types.

Reference

For example:

filter {
  if [type] == "apache" {
    # apache specific processing
  }
  else if [type] == "log_message" {
    # log_message processing
  }
}

If the "data formats" in your question are codecs, this has to be configured in the input of logstash. The following is about filebeat 1.x and logstash 2.x, not the elastic 5 stack. In our setup, we have two beats inputs - the first is default = "plain":

beats {
    port                => 5043
}
beats {
    port                => 5044
    codec               => "json"
}

On the filebeat side, we need two filebeat instances, sending their output to their respective ports. It's not possible to tell filebeat "route this prospector to that output".

Documentation logstash: https://www.elastic.co/guide/en/logstash/2.4/plugins-inputs-beats.html

Remark: If you ship with different protocols, eg legacy logstash-forwarder / lumberjack, you need more ports.

Supported with 7.5.1

filebeat-multifile.yml // file beat installed on a machine

filebeat.inputs:
- type: log
  tags: ["gunicorn"]
  paths:
    - "/home/hduser/Data/gunicorn-100.log"

- type: log
  tags: ["apache"]
  paths:
    - "/home/hduser/Data/apache-access-100.log"

output.logstash:
  hosts: ["0.0.0.0:5044"] // target logstash IP

gunicorn-apache-log.conf // log stash installed on another machine

input {
  beats {
    port => "5044"
    host => "0.0.0.0" 
  }
}

filter {
    if "gunicorn" in [tags] {
        grok {
            match => { "message" => "%{USERNAME:u1} %{USERNAME:u2} \[%{HTTPDATE:http_date}\] \"%{DATA:http_verb} %{URIPATHPARAM:api} %{DATA:http_version}\" %{NUMBER:status_code} %{NUMBER:byte} \"%{DATA:external_api}\" \"%{GREEDYDATA:android_client}\"" }
            remove_field => "message"
        }
    }
    else if "apache" in [tags] {
        grok {
            match => { "message" => "%{IPORHOST:client_ip} %{DATA:u1} %{DATA:u2} \[%{HTTPDATE:http_date}\] \"%{WORD:http_method} %{URIPATHPARAM:api} %{DATA:http_version}\" %{NUMBER:status_code} %{NUMBER:byte} \"%{DATA:external_api}\" \"%{GREEDYDATA:gd}\" \"%{DATA:u3}\""}
            remove_field => "message"
        }

    }
}

output {
    if "gunicorn" in [tags]{
        stdout { codec => rubydebug }

        elasticsearch {
        hosts => [...]
        index => "gunicorn-index"
        }

    }
    else if "apache" in [tags]{
        stdout { codec => rubydebug }

        elasticsearch {
             hosts => [...]
             index => "apache-index"
        }
    }
}

Run filebeat from binary Give proper permission to file

sudo chown root:root filebeat-multifile.yml
sudo chmod go-w filebeat-multifile.yml
sudo ./filebeat -e -c filebeat-multifile-1.yml -d "publish"

Run logstash from binary

./bin/logstash -f gunicorn-apache-log.conf

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM