简体   繁体   中英

How to deserialize Avro messages from Kafka in Flink (Scala)?

I'm reading messages from Kafka into Flink Shell (Scala), as follows :

scala> val stream = senv.addSource(new FlinkKafkaConsumer011[String]("topic", new SimpleStringSchema(), properties)).print()
warning: there was one deprecation warning; re-run with -deprecation for details
stream: org.apache.flink.streaming.api.datastream.DataStreamSink[String] = org.apache.flink.streaming.api.datastream.DataStreamSink@71de1091

Here, I'm using the SimpleStringSchema() as the deserializer, but actually the messages have another Avro schema (say msg.avsc). How do I create a deserializer based on this different Avro schema (msg.avsc), to deserialize the incoming Kafka messages?

I haven't been able to find any code examples or tutorials for doing this in Scala, so any inputs would help. It seems that I may need to extend and implement

org.apache.flink.streaming.util.serialization.DeserializationSchema

for decoding the messages, but I don't know, how to do it. Any tutorials or instructions would be of great help. Since, I don't want to do any custom processing, but just parse the messages as per the Avro schema (msg.avsc), any quick methods of doing this would be very helpful.

I found example for AvroDeserializationSchema class in java

https://github.com/okkam-it/flink-examples/blob/master/src/main/java/org/okkam/flink/avro/AvroDeserializationSchema.java

Code snippet:

If you want to deserialize into specific case class then use new FlinkKafkaConsumer011[case_class_name] , new AvroDeserializationSchema[case_class_name](classOf[case_class_name]

val stream = env .addSource(new FlinkKafkaConsumer011[DeviceData]
 ("test", new AvroDeserializationSchema[case_class_name](classOf[case_class_name]), properties))

If you use Confluent's schema registry, then preferred solution would be to use the Avro serde provided by Confluent. We just call deserialize() and the resolution of the latest version of the Avro schema to use is done automatically behind the scene and no byte manipulation is required.

Something like below in scala.

import io.confluent.kafka.serializers.KafkaAvroDeserializer

...

val valueDeserializer = new KafkaAvroDeserializer()
valueDeserializer.configure(
  Map(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG -> schemaRegistryUrl).asJava, 
  false)

...

override def deserialize(messageKey: Array[Byte], message: Array[Byte], 
                       topic: String, partition: Int, offset: Long): KafkaKV = {

    val key = keyDeserializer.deserialize(topic, messageKey).asInstanceOf[GenericRecord]
    val value = valueDeserializer.deserialize(topic, message).asInstanceOf[GenericRecord]

    KafkaKV(key, value)
    }

...

detailed explanation here: http://svend.kelesia.com/how-to-integrate-flink-with-confluents-schema-registry.html#how-to-integrate-flink-with-confluents-schema-registry

Hope it helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM