簡體   English   中英

使用 Logstash 過濾器操作來自 Kafka 主題的 JSON 消息

[英]Manipulating JSON messages from Kafka topic using Logstash filter

我正在使用 Logstash 2.4 從 Kafka 主題讀取 JSON 消息並將它們發送到 Elasticsearch 索引。

JSON 格式如下——

{
   "schema":
             {
            "type": "struct",
        "fields": [
                    {
                   "type":"string",
                   "optional":false,
                   "field":"reloadID"
                },
                {
                   "type":"string",
                   "optional":false,
                   "field":"externalAccountID"
                },
                {
                   "type":"int64",
                   "optional":false,
                   "name":"org.apache.kafka.connect.data.Timestamp",
                   "version":1,
                   "field":"reloadDate"
                },
                {
                   "type":"int32",
                   "optional":false,
                   "field":"reloadAmount"
                },
                {
                   "type":"string",
                   "optional":true,
                   "field":"reloadChannel"
                }
              ],
        "optional":false,
        "name":"reload"
         },
   "payload":
             {
            "reloadID":"328424295",
        "externalAccountID":"9831200013",
        "reloadDate":1446242463000,
        "reloadAmount":240,
        "reloadChannel":"C1"
         }
}

我的配置文件中沒有任何過濾器,來自 ES 索引的目標文檔如下所示——

{
  "_index" : "kafka_reloads",
  "_type" : "logs",
  "_id" : "AVfcyTU4SyCFNFP2z5-l",
  "_score" : 1.0,
  "_source" : {
    "schema" : {
      "type" : "struct",
      "fields" : [ {
        "type" : "string",
        "optional" : false,
        "field" : "reloadID"
      }, {
        "type" : "string",
        "optional" : false,
        "field" : "externalAccountID"
      }, {
        "type" : "int64",
        "optional" : false,
        "name" : "org.apache.kafka.connect.data.Timestamp",
        "version" : 1,
        "field" : "reloadDate"
      }, {
        "type" : "int32",
        "optional" : false,
        "field" : "reloadAmount"
      }, {
        "type" : "string",
        "optional" : true,
        "field" : "reloadChannel"
      } ],
      "optional" : false,
      "name" : "reload"
    },
    "payload" : {
      "reloadID" : "155559213",
      "externalAccountID" : "9831200014",
      "reloadDate" : 1449529746000,
      "reloadAmount" : 140,
      "reloadChannel" : "C1"
    },
    "@version" : "1",
    "@timestamp" : "2016-10-19T11:56:09.973Z",
  }
}

但是,我只想將“payload”字段的值部分移動到我的 ES 索引作為目標 JSON 正文。 所以我嘗試在配置文件中使用“mutate”過濾器,如下所示——

input {
   kafka {
            zk_connect => "zksrv-1:2181,zksrv-2:2181,zksrv-4:2181"
            group_id => "logstash"
            topic_id => "reload"
            consumer_threads => 3
   }
}
filter {
  mutate {
     remove_field => [ "schema","@version","@timestamp" ]
  }
}
output {
   elasticsearch {
                    hosts => ["datanode-6:9200","datanode-2:9200"]
                    index => "kafka_reloads"
   }
}

使用此過濾器,ES 文檔現在如下所示——

{
      "_index" : "kafka_reloads",
      "_type" : "logs",
      "_id" : "AVfch0yhSyCFNFP2z59f",
      "_score" : 1.0,
      "_source" : {
        "payload" : {
          "reloadID" : "850846698",
          "externalAccountID" : "9831200013",
          "reloadDate" : 1449356706000,
          "reloadAmount" : 30,
          "reloadChannel" : "C1"
        }
      }
}

但實際上它應該像下面這樣——

{
      "_index" : "kafka_reloads",
      "_type" : "logs",
      "_id" : "AVfch0yhSyCFNFP2z59f",
      "_score" : 1.0,
      "_source" : {
          "reloadID" : "850846698",
          "externalAccountID" : "9831200013",
          "reloadDate" : 1449356706000,
          "reloadAmount" : 30,
          "reloadChannel" : "C1"
      }
}

有沒有辦法做到這一點? 任何人都可以幫助我嗎?

我也試過下面的過濾器——

filter {
   json {
      source => "payload"
   }
}

但這給了我這樣的錯誤——

解析json時出錯 {:source=>"payload", :raw=>{"reloadID"=>"572584696", "externalAccountID"=>"9831200011", "reloadDate"=>1449093851000, "reloadAmount"=>180, " reloadChannel"=>"C1"}, :exception=>java.lang.ClassCastException: org.jruby.RubyHash 不能轉換為 org.jruby.RubyIO, :level=>:warn}

任何幫助將不勝感激。

感謝 Gautam Ghosh

您可以使用以下ruby過濾器實現您想要的:

  ruby {
     code => "
        event.to_hash.delete_if {|k, v| k != 'payload'}
        event.to_hash.update(event['payload'].to_hash)
        event.to_hash.delete_if {|k, v| k == 'payload'}
     "
  }

它的作用是:

  1. 刪除除payload所有字段
  2. 復制根級別的所有payload內部字段
  3. 刪除payload字段本身

你最終會得到你需要的東西。

已經有一段時間了,但這里有一個有效的解決方法,希望它有用。

json_encode {
  source => "json"
  target => "json_string"
}

json {
  source => "json_string"
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM