繁体   English   中英

Logstash索引JSON数组

[英]Logstash indexing JSON arrays

Logstash太棒了。 我可以像这样发送JSON(多行可读性):

{
  "a": "one"
  "b": {
    "alpha":"awesome"
  }
}

然后使用搜索词b.alpha:awesome在kibana中查询该行b.alpha:awesome 尼斯。

但是我现在有一个像这样的JSON日志行:

{
  "different":[
    {
      "this": "one",
      "that": "uno"
    },
    {
      "this": "two"
    }
  ]
}

而且我希望能够通过different.this:two的搜索找到这一行。这different.this:two (或者different.this:one ,或者different.that:uno

如果我直接使用Lucene,我会遍历different数组,并为其中的每个哈希生成一个新的搜索索引,但Logstash目前似乎像这样摄取该行:

不同:{this:one,that:uno},{this:two}

这不会帮助我搜索使用different.this日志行。这个或different.that

我是否有任何关于编解码器,过滤器或代码更改的想法,我可以做到这一点?

您可以编写自己的过滤器 (复制和粘贴,重命名类名, config_name filter(event)方法)或修改当前的JSON过滤器(Github上的

您可以在以下路径logstash-1.xx\\lib\\logstash\\filters找到JSON过滤器(Ruby类)源代码,命名为json.rb JSON过滤器将内容解析为JSON,如下所示

begin
  # TODO(sissel): Note, this will not successfully handle json lists
  # like your text is '[ 1,2,3 ]' JSON.parse gives you an array (correctly)
  # which won't merge into a hash. If someone needs this, we can fix it
  # later.
  dest.merge!(JSON.parse(source))

  # If no target, we target the root of the event object. This can allow
  # you to overwrite @timestamp. If so, let's parse it as a timestamp!
  if !@target && event[TIMESTAMP].is_a?(String)
    # This is a hack to help folks who are mucking with @timestamp during
    # their json filter. You aren't supposed to do anything with
    # "@timestamp" outside of the date filter, but nobody listens... ;)
    event[TIMESTAMP] = Time.parse(event[TIMESTAMP]).utc
  end

  filter_matched(event)
rescue => e
  event.tag("_jsonparsefailure")
  @logger.warn("Trouble parsing json", :source => @source,
               :raw => event[@source], :exception => e)
  return
end

您可以修改解析过程以修改原始JSON

  json  = JSON.parse(source)
  if json.is_a?(Hash)
    json.each do |key, value| 
        if value.is_a?(Array)
            value.each_with_index do |object, index|
                #modify as you need
                object["index"]=index
            end
        end
    end
  end
  #save modified json
  ......
  dest.merge!(json)

然后你可以修改你的配置文件以使用/你的新/修改过的JSON过滤器并放在\\logstash-1.xx\\lib\\logstash\\config

这是我elastic_with_json.conf与修改json.rb过滤器

input{
    stdin{

    }
}filter{
    json{
        source => "message"
    }
}output{
    elasticsearch{
        host=>localhost
    }stdout{

    }
}

如果要使用新过滤器,可以使用config_name对其进行配置

class LogStash::Filters::Json_index < LogStash::Filters::Base

  config_name "json_index"
  milestone 2
  ....
end

并配置它

input{
    stdin{

    }
}filter{
    json_index{
        source => "message"
    }
}output{
    elasticsearch{
        host=>localhost
    }stdout{

    }
}

希望这可以帮助。

对于快速而肮脏的黑客,我使用Ruby过滤器和下面的代码,不再需要使用开箱即用的'json'过滤器了

input {
  stdin{}
}

filter {
  grok {
    match => ["message","(?<json_raw>.*)"]
  }
  ruby {
    init => "
      def parse_json obj, pname=nil, event
         obj = JSON.parse(obj) unless obj.is_a? Hash
         obj = obj.to_hash unless obj.is_a? Hash

         obj.each {|k,v|
         p = pname.nil?? k : pname
         if v.is_a? Array
           v.each_with_index {|oo,ii|               
             parse_json_array(oo,ii,p,event)
           }
           elsif v.is_a? Hash
             parse_json(v,p,event)
           else
             p = pname.nil?? k : [pname,k].join('.')
             event[p] = v
           end
         }
        end

        def parse_json_array obj, i,pname, event
          obj = JSON.parse(obj) unless obj.is_a? Hash
          pname_ = pname
          if obj.is_a? Hash
            obj.each {|k,v|
              p=[pname_,i,k].join('.')
              if v.is_a? Array
                v.each_with_index {|oo,ii|
                  parse_json_array(oo,ii,p,event)
                }
              elsif v.is_a? Hash
                parse_json(v,p, event)
              else
                event[p] = v
              end
            }
          else
            n = [pname_, i].join('.')
            event[n] = obj
          end
        end
      "
      code => "parse_json(event['json_raw'].to_s,nil,event) if event['json_raw'].to_s.include? ':'"
    }


  }

output {
  stdout{codec => rubydebug}
}

测试json结构

{"id":123, "members":[{"i":1, "arr":[{"ii":11},{"ii":22}]},{"i":2}], "im_json":{"id":234, "members":[{"i":3},{"i":4}]}}

这是什么输出

      {
           "message" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
          "@version" => "1",
        "@timestamp" => "2014-07-25T00:06:00.814Z",
              "host" => "Leis-MacBook-Pro.local",
          "json_raw" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
                "id" => 123,
       "members.0.i" => 1,
"members.0.arr.0.ii" => 11,
"members.0.arr.1.ii" => 22,
       "members.1.i" => 2,
           "im_json" => 234,
       "im_json.0.i" => 3,
       "im_json.1.i" => 4
      }

我喜欢的解决方案是ruby过滤器,因为这要求我们不要编写另一个过滤器。 但是,该解决方案会创建位于JSON“根”上的字段,并且很难跟踪原始文档的外观。

我提出了类似的东西,更容易遵循,是一个递归的解决方案,所以它更清洁。

ruby {
    init => "
        def arrays_to_hash(h)
          h.each do |k,v|
            # If v is nil, an array is being iterated and the value is k.
            # If v is not nil, a hash is being iterated and the value is v.
            value = v || k
            if value.is_a?(Array)
                # "value" is replaced with "value_hash" later.
                value_hash = {}
                value.each_with_index do |v, i|
                    value_hash[i.to_s] = v
                end
                h[k] = value_hash
            end

            if value.is_a?(Hash) || value.is_a?(Array)
              arrays_to_hash(value)
            end
          end
        end
      "
      code => "arrays_to_hash(event.to_hash)"
}

它将数组转换为has,每个键作为索引号。 更多细节: - http://blog.abhijeetr.com/2016/11/logstashelasticsearch-best-way-to.html

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM