简体   繁体   中英

Manipulating JSON messages from Kafka topic using Logstash filter

I am using Logstash 2.4 to read JSON messages from a Kafka topic and send them to an Elasticsearch Index.

The JSON format is as below --

{
   "schema":
             {
            "type": "struct",
        "fields": [
                    {
                   "type":"string",
                   "optional":false,
                   "field":"reloadID"
                },
                {
                   "type":"string",
                   "optional":false,
                   "field":"externalAccountID"
                },
                {
                   "type":"int64",
                   "optional":false,
                   "name":"org.apache.kafka.connect.data.Timestamp",
                   "version":1,
                   "field":"reloadDate"
                },
                {
                   "type":"int32",
                   "optional":false,
                   "field":"reloadAmount"
                },
                {
                   "type":"string",
                   "optional":true,
                   "field":"reloadChannel"
                }
              ],
        "optional":false,
        "name":"reload"
         },
   "payload":
             {
            "reloadID":"328424295",
        "externalAccountID":"9831200013",
        "reloadDate":1446242463000,
        "reloadAmount":240,
        "reloadChannel":"C1"
         }
}

Without any filter in my config file, the target documents from the ES index look like below --

{
  "_index" : "kafka_reloads",
  "_type" : "logs",
  "_id" : "AVfcyTU4SyCFNFP2z5-l",
  "_score" : 1.0,
  "_source" : {
    "schema" : {
      "type" : "struct",
      "fields" : [ {
        "type" : "string",
        "optional" : false,
        "field" : "reloadID"
      }, {
        "type" : "string",
        "optional" : false,
        "field" : "externalAccountID"
      }, {
        "type" : "int64",
        "optional" : false,
        "name" : "org.apache.kafka.connect.data.Timestamp",
        "version" : 1,
        "field" : "reloadDate"
      }, {
        "type" : "int32",
        "optional" : false,
        "field" : "reloadAmount"
      }, {
        "type" : "string",
        "optional" : true,
        "field" : "reloadChannel"
      } ],
      "optional" : false,
      "name" : "reload"
    },
    "payload" : {
      "reloadID" : "155559213",
      "externalAccountID" : "9831200014",
      "reloadDate" : 1449529746000,
      "reloadAmount" : 140,
      "reloadChannel" : "C1"
    },
    "@version" : "1",
    "@timestamp" : "2016-10-19T11:56:09.973Z",
  }
}

But, I want only the value part of the "payload" field to move to my ES index as the target JSON body. So I tried to use the 'mutate' filter in the config file as below --

input {
   kafka {
            zk_connect => "zksrv-1:2181,zksrv-2:2181,zksrv-4:2181"
            group_id => "logstash"
            topic_id => "reload"
            consumer_threads => 3
   }
}
filter {
  mutate {
     remove_field => [ "schema","@version","@timestamp" ]
  }
}
output {
   elasticsearch {
                    hosts => ["datanode-6:9200","datanode-2:9200"]
                    index => "kafka_reloads"
   }
}

With this filter, the ES documents now look like below --

{
      "_index" : "kafka_reloads",
      "_type" : "logs",
      "_id" : "AVfch0yhSyCFNFP2z59f",
      "_score" : 1.0,
      "_source" : {
        "payload" : {
          "reloadID" : "850846698",
          "externalAccountID" : "9831200013",
          "reloadDate" : 1449356706000,
          "reloadAmount" : 30,
          "reloadChannel" : "C1"
        }
      }
}

But actually It should be like below --

{
      "_index" : "kafka_reloads",
      "_type" : "logs",
      "_id" : "AVfch0yhSyCFNFP2z59f",
      "_score" : 1.0,
      "_source" : {
          "reloadID" : "850846698",
          "externalAccountID" : "9831200013",
          "reloadDate" : 1449356706000,
          "reloadAmount" : 30,
          "reloadChannel" : "C1"
      }
}

Is there a way to do this? Can anyone help me on this?

I also tried the below filter --

filter {
   json {
      source => "payload"
   }
}

But that is giving me errors like --

Error parsing json {:source=>"payload", :raw=>{"reloadID"=>"572584696", "externalAccountID"=>"9831200011", "reloadDate"=>1449093851000, "reloadAmount"=>180, "reloadChannel"=>"C1"}, :exception=>java.lang.ClassCastException: org.jruby.RubyHash cannot be cast to org.jruby.RubyIO, :level=>:warn}

Any help will be much appreciated.

Thanks Gautam Ghosh

You can achieve what you want using the following ruby filter:

  ruby {
     code => "
        event.to_hash.delete_if {|k, v| k != 'payload'}
        event.to_hash.update(event['payload'].to_hash)
        event.to_hash.delete_if {|k, v| k == 'payload'}
     "
  }

What it does is:

  1. remove all fields but the payload one
  2. copy all payload inner fields at the root level
  3. delete the payload field itself

You'll end up with what you need.

It's been a while but here there is a valid workaround, hope it would be useful.

json_encode {
  source => "json"
  target => "json_string"
}

json {
  source => "json_string"
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM