Logstash索引JSON数组

Question

Logstash is awesome. Logstash太棒了。 I can send it JSON like this (multi-lined for readability): 我可以像这样发送JSON（多行可读性）：

{
  "a": "one"
  "b": {
    "alpha":"awesome"
  }
}

And then query for that line in kibana using the search term b.alpha:awesome . 然后使用搜索词b.alpha:awesome在kibana中查询该行b.alpha:awesome 。 Nice. 尼斯。

However I now have a JSON log line like this: 但是我现在有一个像这样的JSON日志行：

{
  "different":[
    {
      "this": "one",
      "that": "uno"
    },
    {
      "this": "two"
    }
  ]
}

And I'd like to be able to find this line with a search like different.this:two (or different.this:one , or different.that:uno ) 而且我希望能够通过different.this:two的搜索找到这一行。这different.this:two （或者different.this:one ，或者different.that:uno ）

If I was using Lucene directly I'd iterate through the different array, and generate a new search index for each hash within it, but Logstash currently seems to ingest that line like this: 如果我直接使用Lucene，我会遍历different数组，并为其中的每个哈希生成一个新的搜索索引，但Logstash目前似乎像这样摄取该行：

different: {this: one, that: uno}, {this: two} 不同：{this：one，that：uno}，{this：two}

Which isn't going to help me searching for log lines using different.this or different.that . 这不会帮助我搜索使用different.this日志行。这个或different.that 。

Any got any thoughts as to a codec, filter or code change I can make to enable this? 我是否有任何关于编解码器，过滤器或代码更改的想法，我可以做到这一点？

Answer 1

You can write your own filter (copy & paste, rename the class name, the config_name and rewrite the filter(event) method) or modify the current JSON filter ( source on Github) 您可以编写自己的过滤器（复制和粘贴，重命名类名， config_name filter(event)方法）或修改当前的JSON过滤器（Github上的源）

You can find the JSON filter (Ruby class) source code in the following path logstash-1.xx\\lib\\logstash\\filters named as json.rb . 您可以在以下路径logstash-1.xx\\lib\\logstash\\filters找到JSON过滤器（Ruby类）源代码，命名为json.rb The JSON filter parse the content as JSON as follows JSON过滤器将内容解析为JSON，如下所示

begin
  # TODO(sissel): Note, this will not successfully handle json lists
  # like your text is '[ 1,2,3 ]' JSON.parse gives you an array (correctly)
  # which won't merge into a hash. If someone needs this, we can fix it
  # later.
  dest.merge!(JSON.parse(source))

  # If no target, we target the root of the event object. This can allow
  # you to overwrite @timestamp. If so, let's parse it as a timestamp!
  if !@target && event[TIMESTAMP].is_a?(String)
    # This is a hack to help folks who are mucking with @timestamp during
    # their json filter. You aren't supposed to do anything with
    # "@timestamp" outside of the date filter, but nobody listens... ;)
    event[TIMESTAMP] = Time.parse(event[TIMESTAMP]).utc
  end

  filter_matched(event)
rescue => e
  event.tag("_jsonparsefailure")
  @logger.warn("Trouble parsing json", :source => @source,
               :raw => event[@source], :exception => e)
  return
end

You can modify the parsing procedure to modify the original JSON 您可以修改解析过程以修改原始JSON

  json  = JSON.parse(source)
  if json.is_a?(Hash)
    json.each do |key, value| 
        if value.is_a?(Array)
            value.each_with_index do |object, index|
                #modify as you need
                object["index"]=index
            end
        end
    end
  end
  #save modified json
  ......
  dest.merge!(json)

then you can modify your config file to use the/your new/modified JSON filter and place in \\logstash-1.xx\\lib\\logstash\\config 然后你可以修改你的配置文件以使用/你的新/修改过的JSON过滤器并放在\\logstash-1.xx\\lib\\logstash\\config

This is mine elastic_with_json.conf with a modified json.rb filter 这是我elastic_with_json.conf与修改json.rb过滤器

input{
    stdin{

    }
}filter{
    json{
        source => "message"
    }
}output{
    elasticsearch{
        host=>localhost
    }stdout{

    }
}

if you want to use your new filter you can configure it with the config_name 如果要使用新过滤器，可以使用config_name对其进行配置

class LogStash::Filters::Json_index < LogStash::Filters::Base

  config_name "json_index"
  milestone 2
  ....
end

and configure it 并配置它

input{
    stdin{

    }
}filter{
    json_index{
        source => "message"
    }
}output{
    elasticsearch{
        host=>localhost
    }stdout{

    }
}

Hope this helps. 希望这可以帮助。

Answer 2

For a quick and dirty hack, I used the Ruby filter and below code , no need to use the out of box 'json' filter anymore 对于快速而肮脏的黑客，我使用Ruby过滤器和下面的代码，不再需要使用开箱即用的'json'过滤器了

input {
  stdin{}
}

filter {
  grok {
    match => ["message","(?<json_raw>.*)"]
  }
  ruby {
    init => "
      def parse_json obj, pname=nil, event
         obj = JSON.parse(obj) unless obj.is_a? Hash
         obj = obj.to_hash unless obj.is_a? Hash

         obj.each {|k,v|
         p = pname.nil?? k : pname
         if v.is_a? Array
           v.each_with_index {|oo,ii|               
             parse_json_array(oo,ii,p,event)
           }
           elsif v.is_a? Hash
             parse_json(v,p,event)
           else
             p = pname.nil?? k : [pname,k].join('.')
             event[p] = v
           end
         }
        end

        def parse_json_array obj, i,pname, event
          obj = JSON.parse(obj) unless obj.is_a? Hash
          pname_ = pname
          if obj.is_a? Hash
            obj.each {|k,v|
              p=[pname_,i,k].join('.')
              if v.is_a? Array
                v.each_with_index {|oo,ii|
                  parse_json_array(oo,ii,p,event)
                }
              elsif v.is_a? Hash
                parse_json(v,p, event)
              else
                event[p] = v
              end
            }
          else
            n = [pname_, i].join('.')
            event[n] = obj
          end
        end
      "
      code => "parse_json(event['json_raw'].to_s,nil,event) if event['json_raw'].to_s.include? ':'"
    }


  }

output {
  stdout{codec => rubydebug}
}

Test json structure 测试json结构

{"id":123, "members":[{"i":1, "arr":[{"ii":11},{"ii":22}]},{"i":2}], "im_json":{"id":234, "members":[{"i":3},{"i":4}]}}

and this is whats output 这是什么输出

      {
           "message" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
          "@version" => "1",
        "@timestamp" => "2014-07-25T00:06:00.814Z",
              "host" => "Leis-MacBook-Pro.local",
          "json_raw" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
                "id" => 123,
       "members.0.i" => 1,
"members.0.arr.0.ii" => 11,
"members.0.arr.1.ii" => 22,
       "members.1.i" => 2,
           "im_json" => 234,
       "im_json.0.i" => 3,
       "im_json.1.i" => 4
      }

Answer 3

The solution I liked is the ruby filter because that requires us to not write another filter. 我喜欢的解决方案是ruby过滤器，因为这要求我们不要编写另一个过滤器。 However, that solution creates fields that are on the "root" of JSON and it's hard to keep track of how the original document looked. 但是，该解决方案会创建位于JSON“根”上的字段，并且很难跟踪原始文档的外观。

I came up with something similar that's easier to follow and is a recursive solution so it's cleaner. 我提出了类似的东西，更容易遵循，是一个递归的解决方案，所以它更清洁。

ruby {
    init => "
        def arrays_to_hash(h)
          h.each do |k,v|
            # If v is nil, an array is being iterated and the value is k.
            # If v is not nil, a hash is being iterated and the value is v.
            value = v || k
            if value.is_a?(Array)
                # "value" is replaced with "value_hash" later.
                value_hash = {}
                value.each_with_index do |v, i|
                    value_hash[i.to_s] = v
                end
                h[k] = value_hash
            end

            if value.is_a?(Hash) || value.is_a?(Array)
              arrays_to_hash(value)
            end
          end
        end
      "
      code => "arrays_to_hash(event.to_hash)"
}

It converts arrays to has with each key as the index number. 它将数组转换为has，每个键作为索引号。 More details:- http://blog.abhijeetr.com/2016/11/logstashelasticsearch-best-way-to.html 更多细节： - http://blog.abhijeetr.com/2016/11/logstashelasticsearch-best-way-to.html

Logstash索引JSON数组

问题描述

3 个解决方案

解决方案1
3 2014-03-26 10:03:46

解决方案2
2 2014-07-24 09:28:07

解决方案3
0 2016-11-24 15:28:51

Logstash索引JSON数组

问题描述

3 个解决方案

解决方案1 3 2014-03-26 10:03:46

解决方案2 2 2014-07-24 09:28:07

解决方案3 0 2016-11-24 15:28:51

解决方案1
3 2014-03-26 10:03:46

解决方案2
2 2014-07-24 09:28:07

解决方案3
0 2016-11-24 15:28:51