[英]Logstash indexing JSON arrays
Logstash is awesome. Logstash太棒了。 I can send it JSON like this (multi-lined for readability):
我可以像这样发送JSON(多行可读性):
{
"a": "one"
"b": {
"alpha":"awesome"
}
}
And then query for that line in kibana using the search term b.alpha:awesome
. 然后使用搜索词
b.alpha:awesome
在kibana中查询该行b.alpha:awesome
。 Nice. 尼斯。
However I now have a JSON log line like this: 但是我现在有一个像这样的JSON日志行:
{
"different":[
{
"this": "one",
"that": "uno"
},
{
"this": "two"
}
]
}
And I'd like to be able to find this line with a search like different.this:two
(or different.this:one
, or different.that:uno
) 而且我希望能够通过
different.this:two
的搜索找到这一行。这different.this:two
(或者different.this:one
,或者different.that:uno
)
If I was using Lucene directly I'd iterate through the different
array, and generate a new search index for each hash within it, but Logstash currently seems to ingest that line like this: 如果我直接使用Lucene,我会遍历
different
数组,并为其中的每个哈希生成一个新的搜索索引,但Logstash目前似乎像这样摄取该行:
different: {this: one, that: uno}, {this: two}
不同:{this:one,that:uno},{this:two}
Which isn't going to help me searching for log lines using different.this
or different.that
. 这不会帮助我搜索使用
different.this
日志行。这个或different.that
。
Any got any thoughts as to a codec, filter or code change I can make to enable this? 我是否有任何关于编解码器,过滤器或代码更改的想法,我可以做到这一点?
You can write your own filter (copy & paste, rename the class name, the config_name
and rewrite the filter(event)
method) or modify the current JSON filter ( source on Github) 您可以编写自己的过滤器 (复制和粘贴,重命名类名,
config_name
filter(event)
方法)或修改当前的JSON过滤器(Github上的源 )
You can find the JSON filter (Ruby class) source code in the following path logstash-1.xx\\lib\\logstash\\filters
named as json.rb
. 您可以在以下路径
logstash-1.xx\\lib\\logstash\\filters
找到JSON过滤器(Ruby类)源代码,命名为json.rb
The JSON filter parse the content as JSON as follows JSON过滤器将内容解析为JSON,如下所示
begin
# TODO(sissel): Note, this will not successfully handle json lists
# like your text is '[ 1,2,3 ]' JSON.parse gives you an array (correctly)
# which won't merge into a hash. If someone needs this, we can fix it
# later.
dest.merge!(JSON.parse(source))
# If no target, we target the root of the event object. This can allow
# you to overwrite @timestamp. If so, let's parse it as a timestamp!
if !@target && event[TIMESTAMP].is_a?(String)
# This is a hack to help folks who are mucking with @timestamp during
# their json filter. You aren't supposed to do anything with
# "@timestamp" outside of the date filter, but nobody listens... ;)
event[TIMESTAMP] = Time.parse(event[TIMESTAMP]).utc
end
filter_matched(event)
rescue => e
event.tag("_jsonparsefailure")
@logger.warn("Trouble parsing json", :source => @source,
:raw => event[@source], :exception => e)
return
end
You can modify the parsing procedure to modify the original JSON 您可以修改解析过程以修改原始JSON
json = JSON.parse(source)
if json.is_a?(Hash)
json.each do |key, value|
if value.is_a?(Array)
value.each_with_index do |object, index|
#modify as you need
object["index"]=index
end
end
end
end
#save modified json
......
dest.merge!(json)
then you can modify your config file to use the/your new/modified JSON filter and place in \\logstash-1.xx\\lib\\logstash\\config
然后你可以修改你的配置文件以使用/你的新/修改过的JSON过滤器并放在
\\logstash-1.xx\\lib\\logstash\\config
This is mine elastic_with_json.conf
with a modified json.rb
filter 这是我
elastic_with_json.conf
与修改json.rb
过滤器
input{
stdin{
}
}filter{
json{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
if you want to use your new filter you can configure it with the config_name
如果要使用新过滤器,可以使用
config_name
对其进行配置
class LogStash::Filters::Json_index < LogStash::Filters::Base
config_name "json_index"
milestone 2
....
end
and configure it 并配置它
input{
stdin{
}
}filter{
json_index{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
Hope this helps. 希望这可以帮助。
For a quick and dirty hack, I used the Ruby
filter and below code , no need to use the out of box 'json' filter anymore 对于快速而肮脏的黑客,我使用
Ruby
过滤器和下面的代码,不再需要使用开箱即用的'json'过滤器了
input {
stdin{}
}
filter {
grok {
match => ["message","(?<json_raw>.*)"]
}
ruby {
init => "
def parse_json obj, pname=nil, event
obj = JSON.parse(obj) unless obj.is_a? Hash
obj = obj.to_hash unless obj.is_a? Hash
obj.each {|k,v|
p = pname.nil?? k : pname
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p,event)
else
p = pname.nil?? k : [pname,k].join('.')
event[p] = v
end
}
end
def parse_json_array obj, i,pname, event
obj = JSON.parse(obj) unless obj.is_a? Hash
pname_ = pname
if obj.is_a? Hash
obj.each {|k,v|
p=[pname_,i,k].join('.')
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p, event)
else
event[p] = v
end
}
else
n = [pname_, i].join('.')
event[n] = obj
end
end
"
code => "parse_json(event['json_raw'].to_s,nil,event) if event['json_raw'].to_s.include? ':'"
}
}
output {
stdout{codec => rubydebug}
}
Test json structure 测试json结构
{"id":123, "members":[{"i":1, "arr":[{"ii":11},{"ii":22}]},{"i":2}], "im_json":{"id":234, "members":[{"i":3},{"i":4}]}}
and this is whats output 这是什么输出
{
"message" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"@version" => "1",
"@timestamp" => "2014-07-25T00:06:00.814Z",
"host" => "Leis-MacBook-Pro.local",
"json_raw" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"id" => 123,
"members.0.i" => 1,
"members.0.arr.0.ii" => 11,
"members.0.arr.1.ii" => 22,
"members.1.i" => 2,
"im_json" => 234,
"im_json.0.i" => 3,
"im_json.1.i" => 4
}
The solution I liked is the ruby filter because that requires us to not write another filter. 我喜欢的解决方案是ruby过滤器,因为这要求我们不要编写另一个过滤器。 However, that solution creates fields that are on the "root" of JSON and it's hard to keep track of how the original document looked.
但是,该解决方案会创建位于JSON“根”上的字段,并且很难跟踪原始文档的外观。
I came up with something similar that's easier to follow and is a recursive solution so it's cleaner. 我提出了类似的东西,更容易遵循,是一个递归的解决方案,所以它更清洁。
ruby {
init => "
def arrays_to_hash(h)
h.each do |k,v|
# If v is nil, an array is being iterated and the value is k.
# If v is not nil, a hash is being iterated and the value is v.
value = v || k
if value.is_a?(Array)
# "value" is replaced with "value_hash" later.
value_hash = {}
value.each_with_index do |v, i|
value_hash[i.to_s] = v
end
h[k] = value_hash
end
if value.is_a?(Hash) || value.is_a?(Array)
arrays_to_hash(value)
end
end
end
"
code => "arrays_to_hash(event.to_hash)"
}
It converts arrays to has with each key as the index number. 它将数组转换为has,每个键作为索引号。 More details:- http://blog.abhijeetr.com/2016/11/logstashelasticsearch-best-way-to.html
更多细节: - http://blog.abhijeetr.com/2016/11/logstashelasticsearch-best-way-to.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.