簡體   English   中英

Logstash索引JSON數組

[英]Logstash indexing JSON arrays

Logstash太棒了。 我可以像這樣發送JSON(多行可讀性):

{
  "a": "one"
  "b": {
    "alpha":"awesome"
  }
}

然后使用搜索詞b.alpha:awesome在kibana中查詢該行b.alpha:awesome 尼斯。

但是我現在有一個像這樣的JSON日志行:

{
  "different":[
    {
      "this": "one",
      "that": "uno"
    },
    {
      "this": "two"
    }
  ]
}

而且我希望能夠通過different.this:two的搜索找到這一行。這different.this:two (或者different.this:one ,或者different.that:uno

如果我直接使用Lucene,我會遍歷different數組,並為其中的每個哈希生成一個新的搜索索引,但Logstash目前似乎像這樣攝取該行:

不同:{this:one,that:uno},{this:two}

這不會幫助我搜索使用different.this日志行。這個或different.that

我是否有任何關於編解碼器,過濾器或代碼更改的想法,我可以做到這一點?

您可以編寫自己的過濾器 (復制和粘貼,重命名類名, config_name filter(event)方法)或修改當前的JSON過濾器(Github上的

您可以在以下路徑logstash-1.xx\\lib\\logstash\\filters找到JSON過濾器(Ruby類)源代碼,命名為json.rb JSON過濾器將內容解析為JSON,如下所示

begin
  # TODO(sissel): Note, this will not successfully handle json lists
  # like your text is '[ 1,2,3 ]' JSON.parse gives you an array (correctly)
  # which won't merge into a hash. If someone needs this, we can fix it
  # later.
  dest.merge!(JSON.parse(source))

  # If no target, we target the root of the event object. This can allow
  # you to overwrite @timestamp. If so, let's parse it as a timestamp!
  if !@target && event[TIMESTAMP].is_a?(String)
    # This is a hack to help folks who are mucking with @timestamp during
    # their json filter. You aren't supposed to do anything with
    # "@timestamp" outside of the date filter, but nobody listens... ;)
    event[TIMESTAMP] = Time.parse(event[TIMESTAMP]).utc
  end

  filter_matched(event)
rescue => e
  event.tag("_jsonparsefailure")
  @logger.warn("Trouble parsing json", :source => @source,
               :raw => event[@source], :exception => e)
  return
end

您可以修改解析過程以修改原始JSON

  json  = JSON.parse(source)
  if json.is_a?(Hash)
    json.each do |key, value| 
        if value.is_a?(Array)
            value.each_with_index do |object, index|
                #modify as you need
                object["index"]=index
            end
        end
    end
  end
  #save modified json
  ......
  dest.merge!(json)

然后你可以修改你的配置文件以使用/你的新/修改過的JSON過濾器並放在\\logstash-1.xx\\lib\\logstash\\config

這是我elastic_with_json.conf與修改json.rb過濾器

input{
    stdin{

    }
}filter{
    json{
        source => "message"
    }
}output{
    elasticsearch{
        host=>localhost
    }stdout{

    }
}

如果要使用新過濾器,可以使用config_name對其進行配置

class LogStash::Filters::Json_index < LogStash::Filters::Base

  config_name "json_index"
  milestone 2
  ....
end

並配置它

input{
    stdin{

    }
}filter{
    json_index{
        source => "message"
    }
}output{
    elasticsearch{
        host=>localhost
    }stdout{

    }
}

希望這可以幫助。

對於快速而骯臟的黑客,我使用Ruby過濾器和下面的代碼,不再需要使用開箱即用的'json'過濾器了

input {
  stdin{}
}

filter {
  grok {
    match => ["message","(?<json_raw>.*)"]
  }
  ruby {
    init => "
      def parse_json obj, pname=nil, event
         obj = JSON.parse(obj) unless obj.is_a? Hash
         obj = obj.to_hash unless obj.is_a? Hash

         obj.each {|k,v|
         p = pname.nil?? k : pname
         if v.is_a? Array
           v.each_with_index {|oo,ii|               
             parse_json_array(oo,ii,p,event)
           }
           elsif v.is_a? Hash
             parse_json(v,p,event)
           else
             p = pname.nil?? k : [pname,k].join('.')
             event[p] = v
           end
         }
        end

        def parse_json_array obj, i,pname, event
          obj = JSON.parse(obj) unless obj.is_a? Hash
          pname_ = pname
          if obj.is_a? Hash
            obj.each {|k,v|
              p=[pname_,i,k].join('.')
              if v.is_a? Array
                v.each_with_index {|oo,ii|
                  parse_json_array(oo,ii,p,event)
                }
              elsif v.is_a? Hash
                parse_json(v,p, event)
              else
                event[p] = v
              end
            }
          else
            n = [pname_, i].join('.')
            event[n] = obj
          end
        end
      "
      code => "parse_json(event['json_raw'].to_s,nil,event) if event['json_raw'].to_s.include? ':'"
    }


  }

output {
  stdout{codec => rubydebug}
}

測試json結構

{"id":123, "members":[{"i":1, "arr":[{"ii":11},{"ii":22}]},{"i":2}], "im_json":{"id":234, "members":[{"i":3},{"i":4}]}}

這是什么輸出

      {
           "message" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
          "@version" => "1",
        "@timestamp" => "2014-07-25T00:06:00.814Z",
              "host" => "Leis-MacBook-Pro.local",
          "json_raw" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
                "id" => 123,
       "members.0.i" => 1,
"members.0.arr.0.ii" => 11,
"members.0.arr.1.ii" => 22,
       "members.1.i" => 2,
           "im_json" => 234,
       "im_json.0.i" => 3,
       "im_json.1.i" => 4
      }

我喜歡的解決方案是ruby過濾器,因為這要求我們不要編寫另一個過濾器。 但是,該解決方案會創建位於JSON“根”上的字段,並且很難跟蹤原始文檔的外觀。

我提出了類似的東西,更容易遵循,是一個遞歸的解決方案,所以它更清潔。

ruby {
    init => "
        def arrays_to_hash(h)
          h.each do |k,v|
            # If v is nil, an array is being iterated and the value is k.
            # If v is not nil, a hash is being iterated and the value is v.
            value = v || k
            if value.is_a?(Array)
                # "value" is replaced with "value_hash" later.
                value_hash = {}
                value.each_with_index do |v, i|
                    value_hash[i.to_s] = v
                end
                h[k] = value_hash
            end

            if value.is_a?(Hash) || value.is_a?(Array)
              arrays_to_hash(value)
            end
          end
        end
      "
      code => "arrays_to_hash(event.to_hash)"
}

它將數組轉換為has,每個鍵作為索引號。 更多細節: - http://blog.abhijeetr.com/2016/11/logstashelasticsearch-best-way-to.html

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM