[英]Manipulating JSON messages from Kafka topic using Logstash filter
我正在使用 Logstash 2.4 從 Kafka 主題讀取 JSON 消息並將它們發送到 Elasticsearch 索引。
JSON 格式如下——
{
"schema":
{
"type": "struct",
"fields": [
{
"type":"string",
"optional":false,
"field":"reloadID"
},
{
"type":"string",
"optional":false,
"field":"externalAccountID"
},
{
"type":"int64",
"optional":false,
"name":"org.apache.kafka.connect.data.Timestamp",
"version":1,
"field":"reloadDate"
},
{
"type":"int32",
"optional":false,
"field":"reloadAmount"
},
{
"type":"string",
"optional":true,
"field":"reloadChannel"
}
],
"optional":false,
"name":"reload"
},
"payload":
{
"reloadID":"328424295",
"externalAccountID":"9831200013",
"reloadDate":1446242463000,
"reloadAmount":240,
"reloadChannel":"C1"
}
}
我的配置文件中沒有任何過濾器,來自 ES 索引的目標文檔如下所示——
{
"_index" : "kafka_reloads",
"_type" : "logs",
"_id" : "AVfcyTU4SyCFNFP2z5-l",
"_score" : 1.0,
"_source" : {
"schema" : {
"type" : "struct",
"fields" : [ {
"type" : "string",
"optional" : false,
"field" : "reloadID"
}, {
"type" : "string",
"optional" : false,
"field" : "externalAccountID"
}, {
"type" : "int64",
"optional" : false,
"name" : "org.apache.kafka.connect.data.Timestamp",
"version" : 1,
"field" : "reloadDate"
}, {
"type" : "int32",
"optional" : false,
"field" : "reloadAmount"
}, {
"type" : "string",
"optional" : true,
"field" : "reloadChannel"
} ],
"optional" : false,
"name" : "reload"
},
"payload" : {
"reloadID" : "155559213",
"externalAccountID" : "9831200014",
"reloadDate" : 1449529746000,
"reloadAmount" : 140,
"reloadChannel" : "C1"
},
"@version" : "1",
"@timestamp" : "2016-10-19T11:56:09.973Z",
}
}
但是,我只想將“payload”字段的值部分移動到我的 ES 索引作為目標 JSON 正文。 所以我嘗試在配置文件中使用“mutate”過濾器,如下所示——
input {
kafka {
zk_connect => "zksrv-1:2181,zksrv-2:2181,zksrv-4:2181"
group_id => "logstash"
topic_id => "reload"
consumer_threads => 3
}
}
filter {
mutate {
remove_field => [ "schema","@version","@timestamp" ]
}
}
output {
elasticsearch {
hosts => ["datanode-6:9200","datanode-2:9200"]
index => "kafka_reloads"
}
}
使用此過濾器,ES 文檔現在如下所示——
{
"_index" : "kafka_reloads",
"_type" : "logs",
"_id" : "AVfch0yhSyCFNFP2z59f",
"_score" : 1.0,
"_source" : {
"payload" : {
"reloadID" : "850846698",
"externalAccountID" : "9831200013",
"reloadDate" : 1449356706000,
"reloadAmount" : 30,
"reloadChannel" : "C1"
}
}
}
但實際上它應該像下面這樣——
{
"_index" : "kafka_reloads",
"_type" : "logs",
"_id" : "AVfch0yhSyCFNFP2z59f",
"_score" : 1.0,
"_source" : {
"reloadID" : "850846698",
"externalAccountID" : "9831200013",
"reloadDate" : 1449356706000,
"reloadAmount" : 30,
"reloadChannel" : "C1"
}
}
有沒有辦法做到這一點? 任何人都可以幫助我嗎?
我也試過下面的過濾器——
filter {
json {
source => "payload"
}
}
但這給了我這樣的錯誤——
解析json時出錯 {:source=>"payload", :raw=>{"reloadID"=>"572584696", "externalAccountID"=>"9831200011", "reloadDate"=>1449093851000, "reloadAmount"=>180, " reloadChannel"=>"C1"}, :exception=>java.lang.ClassCastException: org.jruby.RubyHash 不能轉換為 org.jruby.RubyIO, :level=>:warn}
任何幫助將不勝感激。
感謝 Gautam Ghosh
您可以使用以下ruby
過濾器實現您想要的:
ruby {
code => "
event.to_hash.delete_if {|k, v| k != 'payload'}
event.to_hash.update(event['payload'].to_hash)
event.to_hash.delete_if {|k, v| k == 'payload'}
"
}
它的作用是:
payload
所有字段payload
內部字段payload
字段本身你最終會得到你需要的東西。
已經有一段時間了,但這里有一個有效的解決方法,希望它有用。
json_encode {
source => "json"
target => "json_string"
}
json {
source => "json_string"
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.