如何不通过logstash解析某些字段？

Question

我有一个看起来像这样的日志文件（简化）：

 { "startDate": "2015-05-27", "endDate": "2015-05-27", 
    "request" : {"requestId":"123","field2":1,"field2": 2,"field3":3, ....} }

Log-stash 尝试parse所有字段，包括字段“请求”。 但是有可能不解析这个字段吗？
我想在elastic-search看到“请求”字段，但不应对其进行解析。

这是我的配置文件的一部分：

input {
    file {
        type => "json"
        path => [
                "/var/log/service/restapi.log"
        ]
        tags => ["restapi"]
    }
}

filter {
    ruby {
        init => "require 'socket'"
        code => "
           event['host'] = Socket.gethostname.gsub(/\..*/, '')
           event['request'] = (event['request'].to_s);
        "
    }

    if "restapi" in [tags] {
        json {
            source => "message"
        }
        date {
                match => [ "date_start", "yyyy-MM-dd HH:mm:ss" ]
                target => "date_start"
         }
        date {
                match => [ "date_end", "yyyy-MM-dd HH:mm:ss" ]
                target => "date_end"
        }
        date {
                match => [ "date", "yyyy-MM-dd HH:mm:ss" ]
                target => "date"
        }
    }
}
output {
    if "restapi" in [tags] {
        elasticsearch {
            hosts => ["......."]
            template_name => "logs"
            template => "/etc/logstash/templates/service.json"
            template_overwrite => true
            index => "service-logs-%{+YYYY.MM.dd}"
            idle_flush_time => 20
            flush_size => 500
        }
    }
}

这是我的模板文件：

{
  "template" : "service-*",
  "settings" : {
    "index": {
            "refresh_interval": "60s",
            "number_of_shards": 6,
            "number_of_replicas": 2
        }
  },
  "mappings" : {
    "logs" : {
        "properties" : {
        "@timestamp" : { "type" : "date", "format" : "dateOptionalTime" },
        "@version" : { "type" : "integer", "index" : "not_analyzed" },
        "message": { "type" : "string", "norms" : { "enabled" : false } },
        "method" : { "type" : "string", "index" : "not_analyzed" },
        "traffic_source" : { "type" : "string", "index" : "not_analyzed" },
        "request_path" : { "type" : "string", "index" : "not_analyzed" },
        "status" : { "type" : "integer", "index" : "not_analyzed" },
        "host_name" : { "type" : "string", "index" : "not_analyzed" },
        "environment" : { "type" : "string", "index" : "not_analyzed" },
        "action" : { "type" : "string", "index" : "not_analyzed" },
        "request_id" : { "type" : "string", "index" : "not_analyzed" },
        "date" : { "type" : "date", "format" : "dateOptionalTime" },
        "date_start" : { "type" : "date", "format" : "dateOptionalTime" },
        "date_end" : { "type" : "date", "format" : "dateOptionalTime" },
        "adnest_type" : { "type" : "string", "index" : "not_analyzed" },
        "request" : { "type" : "string", "index" : "not_analyzed" }
      }
    }
  }
}

这是来自logstash.log

response=>{"create"=>{"_index"=>"logs-2017.02.08", "_type"=>"json", "_id"=>"AVoeNgdhD5iEO87EVF_n", "status" =>400, "error"=> "type"=>"mapper_parsing_exception", "reason"=>"failed to parse [request]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"unknown property [requestId]" }}}}, :level=>:warn}

Answer 1

您应该可以使用 ruby 过滤器执行此操作：

filter {
    ruby {
        init => "require 'socket'"
        code => "
           event['host'] = Socket.gethostname.gsub(/\..*/, '')
           event['request'] = (event['request'].to_s);
        "
    }

    if "restapi" in [tags] {
        ruby {
                code => '
                    require "json"
                    event.set("request",event.get("request").to_json)'
        }
        date {
                match => [ "date_start", "yyyy-MM-dd HH:mm:ss" ]
                target => "date_start"
         }
        date {
                match => [ "date_end", "yyyy-MM-dd HH:mm:ss" ]
                target => "date_end"
        }
        date {
                match => [ "date", "yyyy-MM-dd HH:mm:ss" ]
                target => "date"
        }
    }
}

当用存根的 stdin/stdout 测试这个时：

input {
 stdin { codec => json }
}
// above filter{} block here
output {
  stdout { codec=>rubydebug}
}

并像这样测试：

echo '{ "startDate": "2015-05-27", "endDate": "2015-05-27", "request" : {"requestId":"123","field2":1,"field2": 2,"field3":3} }' | bin/logstash -f test.conf

它输出这个：

{
     "startDate" => "2015-05-27",
       "endDate" => "2015-05-27",
       "request" => "{\"requestId\"=>\"123\", \"field2\"=>2, \"field3\"=>3}",
      "@version" => "1",
    "@timestamp" => "2017-02-09T14:37:02.789Z",
          "host" => "xxxx"
}

所以我已经回答了你原来的问题。 如果您不知道为什么您的模板不起作用，您应该问另一个问题。

Answer 2

ElasticSearch 默认分析该字段。 如果您需要的只是不分析request字段，请通过在字段映射中设置"index": "not-analyzed"来更改它的索引方式。

来自此处文档的更多信息

如何不通过logstash解析某些字段？

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-02-08 15:01:41

解决方案2
0 2017-02-08 13:53:41

如何不通过logstash解析某些字段？

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-02-08 15:01:41

解决方案2 0 2017-02-08 13:53:41

解决方案1
1 已采纳 2017-02-08 15:01:41

解决方案2
0 2017-02-08 13:53:41