输入json到logstash-配置问题？

Question

i have the following json input that i want to dump to logstash (and eventually search/dashboard in elasticsearch/kibana). 我有以下我要转储到logstash的json输入（并最终在elasticsearch / kibana中搜索/仪表板）。

{"vulnerabilities":[
    {"ip":"10.1.1.1","dns":"z.acme.com","vid":"12345"},
    {"ip":"10.1.1.2","dns":"y.acme.com","vid":"12345"},
    {"ip":"10.1.1.3","dns":"x.acme.com","vid":"12345"}
]}

i'm using the following logstash configuration 我正在使用以下logstash配置

input {
  file {
    path => "/tmp/logdump/*"
    type => "assets"
    codec => "json"
  }
}
output {
  stdout { codec => rubydebug }
  elasticsearch { host => localhost }
}

output 输出

{
       "message" => "{\"vulnerabilities\":[\r",
      "@version" => "1",
    "@timestamp" => "2014-10-30T23:41:19.788Z",
          "type" => "assets",
          "host" => "av12612sn00-pn9",
          "path" => "/tmp/logdump/stack3.json"
}
{
       "message" => "{\"ip\":\"10.1.1.30\",\"dns\":\"z.acme.com\",\"vid\":\"12345\"},\r",
      "@version" => "1",
    "@timestamp" => "2014-10-30T23:41:19.838Z",
          "type" => "assets",
          "host" => "av12612sn00-pn9",
          "path" => "/tmp/logdump/stack3.json"
}
{
       "message" => "{\"ip\":\"10.1.1.31\",\"dns\":\"y.acme.com\",\"vid\":\"12345\"},\r",
      "@version" => "1",
    "@timestamp" => "2014-10-30T23:41:19.870Z",
          "type" => "shellshock",
          "host" => "av1261wag2sn00-pn9",
          "path" => "/tmp/logdump/stack3.json"
}
{
            "ip" => "10.1.1.32",
           "dns" => "x.acme.com",
           "vid" => "12345",
      "@version" => "1",
    "@timestamp" => "2014-10-30T23:41:19.884Z",
          "type" => "assets",
          "host" => "av12612sn00-pn9",
          "path" => "/tmp/logdump/stack3.json"
}

obviously logstash is treating each line as an event and it thinks {"vulnerabilities":[ is an event and i'm guessing the trailing commas on the 2 subsequent nodes mess up the parsing, and the last node appears coorrect. 显然，logstash将每行视为一个事件，并且它认为{"vulnerabilities":[是一个事件，我猜测随后的2个节点上的尾部逗号会使解析混乱，并且最后一个节点看起来是正确的。 how do i tell logstash to parse the events inside the vulnerabilities array and to ignore the commas at the end of the line? 我如何告诉Logstash解析漏洞数组内的事件，并忽略该行末尾的逗号？

Updated: 2014-11-05 Following Magnus' recommendations, I added the json filter and it's working perfectly. 更新日期：2014-11-05按照Magnus的建议，我添加了json过滤器，它运行良好。 However, it would not parse the last line of the json correctly without specifying start_position => "beginning" in the file input block. 但是，如果未在文件输入块中指定start_position => "beginning" ，它将无法正确解析json的最后一行。 Any ideas why not? 任何想法为什么不呢？ I know it parses bottom up by default but would anticipate the mutate/gsub would handle this smoothly? 我知道默认情况下它会自下而上解析，但是可以预期mutate / gsub会顺利处理吗？

file {
    path => "/tmp/logdump/*"
    type => "assets"
    start_position => "beginning"
  }
}
filter {
  if [message] =~ /^\[?{"ip":/ {
    mutate {
      gsub => [
        "message", "^\[{", "{",
        "message", "},?\]?$", "}"
      ]
    }
    json {
      source => "message"
      remove_field => ["message"]
    }
  }
}
output {
  stdout { codec => rubydebug }
  elasticsearch { host => localhost }
}

Answer 1

You could skip the json codec and use a multiline filter to join the message into a single string that you can feed to the json filter.filter { 您可以跳过json编解码器，并使用多行过滤器将邮件加入单个字符串中，然后将其输入到json过滤器中。

filter {
  multiline {
    pattern => '^{"vulnerabilities":\['
    negate => true
    what => "previous"
  }
  json {
    source => "message"
  }
}

However, this produces the following unwanted results: 但是，这会产生以下不良结果：

{
            "message" => "<omitted for brevity>",
           "@version" => "1",
         "@timestamp" => "2014-10-31T06:48:15.589Z",
               "host" => "name-of-your-host",
               "tags" => [
        [0] "multiline"
    ],
    "vulnerabilities" => [
        [0] {
             "ip" => "10.1.1.1",
            "dns" => "z.acme.com",
            "vid" => "12345"
        },
        [1] {
             "ip" => "10.1.1.2",
            "dns" => "y.acme.com",
            "vid" => "12345"
        },
        [2] {
             "ip" => "10.1.1.3",
            "dns" => "x.acme.com",
            "vid" => "12345"
        }
    ]
}

Unless there's a fixed number of elements in the vulnerabilities array I don't think there's much we can do with this (without resorting to the ruby filter). 除非漏洞数组中有固定数量的元素，否则我认为我们无法做很多事情（无需求助于ruby过滤器）。

How about just applying the json filter to lines that look like what we want and drop the rest? 仅将json过滤器应用于看起来像我们想要的行，然后丢弃其余行呢？ Your question doesn't make it clear whether all of the log looks like this so this may not be so useful. 您的问题不清楚所有日志是否都像这样，因此可能没有太大用处。

filter {
  if [message] =~ /^\s+{"ip":/ {
    # Remove trailing commas
    mutate {
      gsub => ["message", ",$", ""]
    }
    json {
      source => "message"
      remove_field => ["message"]
    }
  } else {
    drop {}
  }
}

输入json到logstash-配置问题？

问题描述

1 个解决方案

解决方案1
5 已采纳 2014-10-31 06:59:03

输入json到logstash-配置问题？

问题描述

1 个解决方案

解决方案1 5 已采纳 2014-10-31 06:59:03

解决方案1
5 已采纳 2014-10-31 06:59:03