[英]Logstash indexing JSON arrays
Logstash太棒了。 我可以像这样发送JSON(多行可读性):
{
"a": "one"
"b": {
"alpha":"awesome"
}
}
然后使用搜索词b.alpha:awesome
在kibana中查询该行b.alpha:awesome
。 尼斯。
但是我现在有一个像这样的JSON日志行:
{
"different":[
{
"this": "one",
"that": "uno"
},
{
"this": "two"
}
]
}
而且我希望能够通过different.this:two
的搜索找到这一行。这different.this:two
(或者different.this:one
,或者different.that:uno
)
如果我直接使用Lucene,我会遍历different
数组,并为其中的每个哈希生成一个新的搜索索引,但Logstash目前似乎像这样摄取该行:
不同:{this:one,that:uno},{this:two}
这不会帮助我搜索使用different.this
日志行。这个或different.that
。
我是否有任何关于编解码器,过滤器或代码更改的想法,我可以做到这一点?
您可以编写自己的过滤器 (复制和粘贴,重命名类名, config_name
filter(event)
方法)或修改当前的JSON过滤器(Github上的源 )
您可以在以下路径logstash-1.xx\\lib\\logstash\\filters
找到JSON过滤器(Ruby类)源代码,命名为json.rb
JSON过滤器将内容解析为JSON,如下所示
begin
# TODO(sissel): Note, this will not successfully handle json lists
# like your text is '[ 1,2,3 ]' JSON.parse gives you an array (correctly)
# which won't merge into a hash. If someone needs this, we can fix it
# later.
dest.merge!(JSON.parse(source))
# If no target, we target the root of the event object. This can allow
# you to overwrite @timestamp. If so, let's parse it as a timestamp!
if !@target && event[TIMESTAMP].is_a?(String)
# This is a hack to help folks who are mucking with @timestamp during
# their json filter. You aren't supposed to do anything with
# "@timestamp" outside of the date filter, but nobody listens... ;)
event[TIMESTAMP] = Time.parse(event[TIMESTAMP]).utc
end
filter_matched(event)
rescue => e
event.tag("_jsonparsefailure")
@logger.warn("Trouble parsing json", :source => @source,
:raw => event[@source], :exception => e)
return
end
您可以修改解析过程以修改原始JSON
json = JSON.parse(source)
if json.is_a?(Hash)
json.each do |key, value|
if value.is_a?(Array)
value.each_with_index do |object, index|
#modify as you need
object["index"]=index
end
end
end
end
#save modified json
......
dest.merge!(json)
然后你可以修改你的配置文件以使用/你的新/修改过的JSON过滤器并放在\\logstash-1.xx\\lib\\logstash\\config
这是我elastic_with_json.conf
与修改json.rb
过滤器
input{
stdin{
}
}filter{
json{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
如果要使用新过滤器,可以使用config_name
对其进行配置
class LogStash::Filters::Json_index < LogStash::Filters::Base
config_name "json_index"
milestone 2
....
end
并配置它
input{
stdin{
}
}filter{
json_index{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
希望这可以帮助。
对于快速而肮脏的黑客,我使用Ruby
过滤器和下面的代码,不再需要使用开箱即用的'json'过滤器了
input {
stdin{}
}
filter {
grok {
match => ["message","(?<json_raw>.*)"]
}
ruby {
init => "
def parse_json obj, pname=nil, event
obj = JSON.parse(obj) unless obj.is_a? Hash
obj = obj.to_hash unless obj.is_a? Hash
obj.each {|k,v|
p = pname.nil?? k : pname
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p,event)
else
p = pname.nil?? k : [pname,k].join('.')
event[p] = v
end
}
end
def parse_json_array obj, i,pname, event
obj = JSON.parse(obj) unless obj.is_a? Hash
pname_ = pname
if obj.is_a? Hash
obj.each {|k,v|
p=[pname_,i,k].join('.')
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p, event)
else
event[p] = v
end
}
else
n = [pname_, i].join('.')
event[n] = obj
end
end
"
code => "parse_json(event['json_raw'].to_s,nil,event) if event['json_raw'].to_s.include? ':'"
}
}
output {
stdout{codec => rubydebug}
}
测试json结构
{"id":123, "members":[{"i":1, "arr":[{"ii":11},{"ii":22}]},{"i":2}], "im_json":{"id":234, "members":[{"i":3},{"i":4}]}}
这是什么输出
{
"message" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"@version" => "1",
"@timestamp" => "2014-07-25T00:06:00.814Z",
"host" => "Leis-MacBook-Pro.local",
"json_raw" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"id" => 123,
"members.0.i" => 1,
"members.0.arr.0.ii" => 11,
"members.0.arr.1.ii" => 22,
"members.1.i" => 2,
"im_json" => 234,
"im_json.0.i" => 3,
"im_json.1.i" => 4
}
我喜欢的解决方案是ruby过滤器,因为这要求我们不要编写另一个过滤器。 但是,该解决方案会创建位于JSON“根”上的字段,并且很难跟踪原始文档的外观。
我提出了类似的东西,更容易遵循,是一个递归的解决方案,所以它更清洁。
ruby {
init => "
def arrays_to_hash(h)
h.each do |k,v|
# If v is nil, an array is being iterated and the value is k.
# If v is not nil, a hash is being iterated and the value is v.
value = v || k
if value.is_a?(Array)
# "value" is replaced with "value_hash" later.
value_hash = {}
value.each_with_index do |v, i|
value_hash[i.to_s] = v
end
h[k] = value_hash
end
if value.is_a?(Hash) || value.is_a?(Array)
arrays_to_hash(value)
end
end
end
"
code => "arrays_to_hash(event.to_hash)"
}
它将数组转换为has,每个键作为索引号。 更多细节: - http://blog.abhijeetr.com/2016/11/logstashelasticsearch-best-way-to.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.