简体   繁体   English

如何使用 Apache Beam 反序列化 Kafka AVRO 消息

[英]How to Deserialising Kafka AVRO messages using Apache Beam

The main goal is the aggregate two Kafka topics, one compacted slow moving data and the other fast moving data which is received every second.主要目标是聚合两个 Kafka 主题,一个是压缩的慢速移动数据,另一个是每秒接收到的快速移动数据。

I have been able to consume messages in simple scenarios such as a KV (Long,String) using something like:我已经能够在诸如 KV (Long,String) 之类的简单场景中使用类似以下内容的消息:

PCollection<KV<Long,String>> input = p.apply(KafkaIO.<Long, 
String>read()
.withKeyDeserializer(LongDeserializer.class)
.withValueDeserializer(StringDeserializer.class)

PCollection<String> output = input.apply(Values.<String>create());

But this doesn't seem to be the approach when you need to deserialise from AVRO.但是当您需要从 AVRO 反序列化时,这似乎不是方法。 I have a KV(STRING, AVRO) which I need to consume.我有一个需要消耗的 KV(STRING, AVRO)。

I attempted generating the Java Classes from the AVRO schema and then including them in the “apply” for example:我尝试从 AVRO 模式生成 Java 类,然后将它们包含在“应用”中,例如:

PCollection<MyClass> output = input.apply(Values.<MyClass>create());

But this didn't seem to be the correct approach.但这似乎不是正确的方法。

Is there any documentation/examples anyone could point me to, so I could get an understanding as to how you would work with Kafka AVRO and Beam.是否有任何人可以指点我的文档/示例,以便我了解您将如何使用 Kafka AVRO 和 Beam。 Any help would be much appreciated.任何帮助将非常感激。

I have updated my code:我已经更新了我的代码:

import io.confluent.kafka.serializers.KafkaAvroDeserializer;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.coders.AvroCoder;
import org.apache.beam.sdk.io.kafka.KafkaIO;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
import org.apache.kafka.common.serialization.LongDeserializer;

public class Main {

public static void main(String[] args) {

    PipelineOptions options = PipelineOptionsFactory.create();

    Pipeline p = Pipeline.create(options);

    PCollection<KV<Long, Myclass>> input = p.apply(KafkaIO.<Long, String>read()
            .withKeyDeserializer(LongDeserializer.class)
            .withValueDeserializerAndCoder(KafkaAvroDeserializer.class, AvroCoder.of(Myclass.class))
    );

    p.run();

}
}
#######################################################
import org.apache.beam.sdk.coders.AvroCoder;
import org.apache.beam.sdk.coders.DefaultCoder;

@DefaultCoder(AvroCoder.class)
public class Myclass{
String name;
String age;

Myclass(){}
Myclass(String n, String a) {
    this.name= n;
    this.age= a;
}
}

But i now get the following error incompatible types: java.lang.Class < io.confluent.kafka.serializers.KafkaAvroDeserializer > cannot be converted to java.lang.Class < ?但我现在得到以下错误不兼容的类型: java.lang.Class < io.confluent.kafka.serializers.KafkaAvroDeserializer > 无法转换为 java.lang.Class < ? extends org.apache.kafka.common.serialization.Deserializer < java.lang.String > >扩展 org.apache.kafka.common.serialization.Deserializer < java.lang.String >>

I must be importing the incorrect serializers?我必须导入不正确的序列化程序?

You can use KafkaAvroDeserializer as following:您可以使用 KafkaAvroDeserializer 如下:

PCollection<KV<Long,MyClass>> input = p.apply(KafkaIO.<Long, String>read()
.withKeyDeserializer(LongDeserializer.class)
  .withValueDeserializerAndCoder(KafkaAvroDeserializer.class, AvroCoder.of(MyClass.class))

Where MyClass is the POJO class generated Avro Schema.其中MyClass是生成 Avro Schema 的 POJO 类。

Make sure your POJO class has annotation AvroCoder as in below example :确保您的 POJO 类具有注释 AvroCoder,如下例所示:

@DefaultCoder(AvroCoder.class)
   public class MyClass{
      String name;
      String age;

      MyClass(){}
      MyClass(String n, String a) {
         this.name= n;
         this.age= a;
      }
  }

I have faced the same issue.我遇到了同样的问题。 Found the solution in this mail-archives.在此邮件档案中找到了解决方案。 http://mail-archives.apache.org/mod_mbox/beam-user/201710.mbox/%3CCAMsy_NiVrT_9_xfxOtK1inHxb=x_yAdBcBN+4aquu_hn0GJ0nA@mail.gmail.com%3E http://mail-archives.apache.org/mod_mbox/beam-user/201710.mbox/%3CCAMsy_NiVrT_9_xfxOtK1inHxb=x_yAdBcBN+4aquu_hn0GJ0nA@mail.gmail.com%3E

In your case, you need to defined your own KafkaAvroDeserializer like as follows.在您的情况下,您需要定义自己的KafkaAvroDeserializer ,如下所示。

public class MyClassKafkaAvroDeserializer extends
  AbstractKafkaAvroDeserializer implements Deserializer<MyClass> {

  @Override
  public void configure(Map<String, ?> configs, boolean isKey) {
      configure(new KafkaAvroDeserializerConfig(configs));
  }

  @Override
  public MyClass deserialize(String s, byte[] bytes) {
      return (MyClass) this.deserialize(bytes);
  }

  @Override
  public void close() {} }

Then specify your KafkaAvroDeserializer as ValueDeserializer.然后将您的KafkaAvroDeserializer指定为ValueDeserializer

p.apply(KafkaIO.<Long, MyClass>read()
 .withKeyDeserializer(LongDeserializer.class)
 .withValueDeserializer(MyClassKafkaAvroDeserializer.class) );

Change KafkaIO.<Long, String>read() to KafkaIO.<Long, Object>read() .KafkaIO.<Long, String>read()更改为KafkaIO.<Long, Object>read()

If you look into the implementation of KafkaAvroDeserializer, it implements Deserializer:如果您查看 KafkaAvroDeserializer 的实现,它会实现 Deserializer:

public class KafkaAvroDeserializer extends AbstractKafkaAvroDeserializer implements Deserializer<Object>

Yohei's answer is good, but I also found this to work Yohei 的回答很好,但我也发现它有效

import io.confluent.kafka.streams.serdes.avro.SpecificAvroDeserializer;

...

public static class CustomKafkaAvroDeserializer extends SpecificAvroDeserializer<MyCustomClass> {}

...
.withValueDeserializerAndCoder(CustomKafkaAvroDeserializer.class, AvroCoder.of(MyCustomClass.class))
...

where MyCustomClass is code gen'd with Avro tools.其中MyCustomClass是使用 Avro 工具生成的代码。

I had a similar issue today, and came across the following example which resolved it for me.我今天遇到了类似的问题,并遇到了以下示例,它为我解决了这个问题。

https://github.com/andrewrjones/debezium-kafka-beam-example/blob/master/src/main/java/com/andrewjones/KafkaAvroConsumerExample.java https://github.com/andrewrjones/debezium-kafka-beam-example/blob/master/src/main/java/com/andrewjones/KafkaAvroConsumerExample.java

the missing piece for me was (Class)KafkaAvroDeserializer对我来说缺少的部分是 (Class)KafkaAvroDeserializer

KafkaIO.<String, MyClass>read()
        .withBootstrapServers("kafka:9092")
        .withTopic("dbserver1.inventory.customers")
        .withKeyDeserializer(StringDeserializer.class)
        .withValueDeserializerAndCoder((Class)KafkaAvroDeserializer.class, AvroCoder.of(MyClass.class))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM