Filebeat 和 LogStash — 多种不同格式的数据

Question

我有 Filebeat、Logstash、ElasticSearch 和 Kibana。 Filebeat 位于单独的服务器上，它应该接收不同格式的数据：syslog、json、来自数据库等，并将其发送到 Logstash。

我知道如何设置 Logstash 以使其处理单一格式，但由于有多种数据格式，我将如何配置 Logstash 以正确处理每种数据格式？

事实上，我如何设置它们，Logstash 和 Filebeat，以便所有不同格式的数据都从 Filebeat 发送并正确提交到 Logstash？ 我的意思是，处理发送和接收数据的配置设置。

Answer 1

要在 Logstash 管道中分离不同类型的输入，请使用type字段和tags进行更多标识。

在您的 Filebeat 配置中，您应该为每种不同的数据格式使用不同的探矿者，然后可以将每个探矿者设置为具有不同的document_type:字段。

参考

例如：

filebeat:
  # List of prospectors to fetch data.
  prospectors:
    # Each - is a prospector. Below are the prospector specific configurations
    -
      # Paths that should be crawled and fetched. Glob based paths.
      # For each file found under this path, a harvester is started.
      paths:
        - "/var/log/apache/httpd-*.log"
      # Type to be published in the 'type' field. For Elasticsearch output,
      # the type defines the document type these entries should be stored
      # in. Default: log
      document_type: apache
    -
      paths:
        - /var/log/messages
        - "/var/log/*.log"
      document_type: log_message

在上面的示例中，来自/var/log/apache/httpd-*.log日志将具有document_type: apache ，而另一个探矿者具有document_type: log_message 。

当 Logstash 正在处理事件时，此document-type字段将成为type字段。 然后，您可以在 Logstash 中使用if语句对不同类型进行不同处理。

参考

例如：

filter {
  if [type] == "apache" {
    # apache specific processing
  }
  else if [type] == "log_message" {
    # log_message processing
  }
}

Answer 2

如果您问题中的“数据格式”是编解码器，则必须在logstash 的输入中进行配置。 以下是关于 filebeat 1.x 和 logstash 2.x，而不是 elastic 5 堆栈。 在我们的设置中，我们有两个节拍输入 - 第一个是 default = "plain"：

beats {
    port                => 5043
}
beats {
    port                => 5044
    codec               => "json"
}

在 filebeat 方面，我们需要两个 filebeat 实例，将它们的输出发送到各自的端口。 不可能告诉 filebeat“将此探矿者路由到该输出”。

文档logstash： https ://www.elastic.co/guide/en/logstash/2.4/plugins-inputs-beats.html

备注：如果您使用不同的协议，例如旧的 logstash-forwarder / lumberjack，您需要更多的端口。

Answer 3

7.5.1 支持

filebeat-multifile.yml // 安装在机器上的文件beat

filebeat.inputs:
- type: log
  tags: ["gunicorn"]
  paths:
    - "/home/hduser/Data/gunicorn-100.log"

- type: log
  tags: ["apache"]
  paths:
    - "/home/hduser/Data/apache-access-100.log"

output.logstash:
  hosts: ["0.0.0.0:5044"] // target logstash IP

gunicorn-apache-log.conf // 安装在另一台机器上的日志存储

input {
  beats {
    port => "5044"
    host => "0.0.0.0" 
  }
}

filter {
    if "gunicorn" in [tags] {
        grok {
            match => { "message" => "%{USERNAME:u1} %{USERNAME:u2} \[%{HTTPDATE:http_date}\] \"%{DATA:http_verb} %{URIPATHPARAM:api} %{DATA:http_version}\" %{NUMBER:status_code} %{NUMBER:byte} \"%{DATA:external_api}\" \"%{GREEDYDATA:android_client}\"" }
            remove_field => "message"
        }
    }
    else if "apache" in [tags] {
        grok {
            match => { "message" => "%{IPORHOST:client_ip} %{DATA:u1} %{DATA:u2} \[%{HTTPDATE:http_date}\] \"%{WORD:http_method} %{URIPATHPARAM:api} %{DATA:http_version}\" %{NUMBER:status_code} %{NUMBER:byte} \"%{DATA:external_api}\" \"%{GREEDYDATA:gd}\" \"%{DATA:u3}\""}
            remove_field => "message"
        }

    }
}

output {
    if "gunicorn" in [tags]{
        stdout { codec => rubydebug }

        elasticsearch {
        hosts => [...]
        index => "gunicorn-index"
        }

    }
    else if "apache" in [tags]{
        stdout { codec => rubydebug }

        elasticsearch {
             hosts => [...]
             index => "apache-index"
        }
    }
}

从二进制运行 filebeat 给文件适当的权限

sudo chown root:root filebeat-multifile.yml
sudo chmod go-w filebeat-multifile.yml
sudo ./filebeat -e -c filebeat-multifile-1.yml -d "publish"

从二进制运行 logstash

./bin/logstash -f gunicorn-apache-log.conf

Filebeat 和 LogStash — 多种不同格式的数据

问题描述

3 个解决方案

解决方案1
4 2016-06-07 16:33:15

解决方案2
1 2016-11-27 11:03:19

解决方案3
1 2020-01-07 10:41:33

Filebeat 和 LogStash — 多种不同格式的数据

问题描述

3 个解决方案

解决方案1 4 2016-06-07 16:33:15

解决方案2 1 2016-11-27 11:03:19

解决方案3 1 2020-01-07 10:41:33

解决方案1
4 2016-06-07 16:33:15

解决方案2
1 2016-11-27 11:03:19

解决方案3
1 2020-01-07 10:41:33