[英]Logstash indexing JSON arrays
Logstash太棒了。 我可以像這樣發送JSON(多行可讀性):
{
"a": "one"
"b": {
"alpha":"awesome"
}
}
然后使用搜索詞b.alpha:awesome
在kibana中查詢該行b.alpha:awesome
。 尼斯。
但是我現在有一個像這樣的JSON日志行:
{
"different":[
{
"this": "one",
"that": "uno"
},
{
"this": "two"
}
]
}
而且我希望能夠通過different.this:two
的搜索找到這一行。這different.this:two
(或者different.this:one
,或者different.that:uno
)
如果我直接使用Lucene,我會遍歷different
數組,並為其中的每個哈希生成一個新的搜索索引,但Logstash目前似乎像這樣攝取該行:
不同:{this:one,that:uno},{this:two}
這不會幫助我搜索使用different.this
日志行。這個或different.that
。
我是否有任何關於編解碼器,過濾器或代碼更改的想法,我可以做到這一點?
您可以編寫自己的過濾器 (復制和粘貼,重命名類名, config_name
filter(event)
方法)或修改當前的JSON過濾器(Github上的源 )
您可以在以下路徑logstash-1.xx\\lib\\logstash\\filters
找到JSON過濾器(Ruby類)源代碼,命名為json.rb
JSON過濾器將內容解析為JSON,如下所示
begin
# TODO(sissel): Note, this will not successfully handle json lists
# like your text is '[ 1,2,3 ]' JSON.parse gives you an array (correctly)
# which won't merge into a hash. If someone needs this, we can fix it
# later.
dest.merge!(JSON.parse(source))
# If no target, we target the root of the event object. This can allow
# you to overwrite @timestamp. If so, let's parse it as a timestamp!
if !@target && event[TIMESTAMP].is_a?(String)
# This is a hack to help folks who are mucking with @timestamp during
# their json filter. You aren't supposed to do anything with
# "@timestamp" outside of the date filter, but nobody listens... ;)
event[TIMESTAMP] = Time.parse(event[TIMESTAMP]).utc
end
filter_matched(event)
rescue => e
event.tag("_jsonparsefailure")
@logger.warn("Trouble parsing json", :source => @source,
:raw => event[@source], :exception => e)
return
end
您可以修改解析過程以修改原始JSON
json = JSON.parse(source)
if json.is_a?(Hash)
json.each do |key, value|
if value.is_a?(Array)
value.each_with_index do |object, index|
#modify as you need
object["index"]=index
end
end
end
end
#save modified json
......
dest.merge!(json)
然后你可以修改你的配置文件以使用/你的新/修改過的JSON過濾器並放在\\logstash-1.xx\\lib\\logstash\\config
這是我elastic_with_json.conf
與修改json.rb
過濾器
input{
stdin{
}
}filter{
json{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
如果要使用新過濾器,可以使用config_name
對其進行配置
class LogStash::Filters::Json_index < LogStash::Filters::Base
config_name "json_index"
milestone 2
....
end
並配置它
input{
stdin{
}
}filter{
json_index{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
希望這可以幫助。
對於快速而骯臟的黑客,我使用Ruby
過濾器和下面的代碼,不再需要使用開箱即用的'json'過濾器了
input {
stdin{}
}
filter {
grok {
match => ["message","(?<json_raw>.*)"]
}
ruby {
init => "
def parse_json obj, pname=nil, event
obj = JSON.parse(obj) unless obj.is_a? Hash
obj = obj.to_hash unless obj.is_a? Hash
obj.each {|k,v|
p = pname.nil?? k : pname
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p,event)
else
p = pname.nil?? k : [pname,k].join('.')
event[p] = v
end
}
end
def parse_json_array obj, i,pname, event
obj = JSON.parse(obj) unless obj.is_a? Hash
pname_ = pname
if obj.is_a? Hash
obj.each {|k,v|
p=[pname_,i,k].join('.')
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p, event)
else
event[p] = v
end
}
else
n = [pname_, i].join('.')
event[n] = obj
end
end
"
code => "parse_json(event['json_raw'].to_s,nil,event) if event['json_raw'].to_s.include? ':'"
}
}
output {
stdout{codec => rubydebug}
}
測試json結構
{"id":123, "members":[{"i":1, "arr":[{"ii":11},{"ii":22}]},{"i":2}], "im_json":{"id":234, "members":[{"i":3},{"i":4}]}}
這是什么輸出
{
"message" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"@version" => "1",
"@timestamp" => "2014-07-25T00:06:00.814Z",
"host" => "Leis-MacBook-Pro.local",
"json_raw" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"id" => 123,
"members.0.i" => 1,
"members.0.arr.0.ii" => 11,
"members.0.arr.1.ii" => 22,
"members.1.i" => 2,
"im_json" => 234,
"im_json.0.i" => 3,
"im_json.1.i" => 4
}
我喜歡的解決方案是ruby過濾器,因為這要求我們不要編寫另一個過濾器。 但是,該解決方案會創建位於JSON“根”上的字段,並且很難跟蹤原始文檔的外觀。
我提出了類似的東西,更容易遵循,是一個遞歸的解決方案,所以它更清潔。
ruby {
init => "
def arrays_to_hash(h)
h.each do |k,v|
# If v is nil, an array is being iterated and the value is k.
# If v is not nil, a hash is being iterated and the value is v.
value = v || k
if value.is_a?(Array)
# "value" is replaced with "value_hash" later.
value_hash = {}
value.each_with_index do |v, i|
value_hash[i.to_s] = v
end
h[k] = value_hash
end
if value.is_a?(Hash) || value.is_a?(Array)
arrays_to_hash(value)
end
end
end
"
code => "arrays_to_hash(event.to_hash)"
}
它將數組轉換為has,每個鍵作為索引號。 更多細節: - http://blog.abhijeetr.com/2016/11/logstashelasticsearch-best-way-to.html
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.