简体   繁体   English

KafkaAvroSerializer 用于在没有 schema.registry.url 的情况下序列化 Avro

[英]KafkaAvroSerializer for serializing Avro without schema.registry.url

I'm a noob to Kafka and Avro.我是 Kafka 和 Avro 的菜鸟。 So i have been trying to get the Producer/Consumer running.所以我一直试图让生产者/消费者运行。 So far i have been able to produce and consume simple Bytes and Strings, using the following : Configuration for the Producer :到目前为止,我已经能够使用以下方法生成和使用简单的字节和字符串: Producer 的配置:

    Properties props = new Properties();
    props.put("bootstrap.servers", "localhost:9092");
    props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    props.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");

    Schema.Parser parser = new Schema.Parser();
    Schema schema = parser.parse(USER_SCHEMA);
    Injection<GenericRecord, byte[]> recordInjection = GenericAvroCodecs.toBinary(schema);

    KafkaProducer<String, byte[]> producer = new KafkaProducer<>(props);

    for (int i = 0; i < 1000; i++) {
        GenericData.Record avroRecord = new GenericData.Record(schema);
        avroRecord.put("str1", "Str 1-" + i);
        avroRecord.put("str2", "Str 2-" + i);
        avroRecord.put("int1", i);

        byte[] bytes = recordInjection.apply(avroRecord);

        ProducerRecord<String, byte[]> record = new ProducerRecord<>("mytopic", bytes);
        producer.send(record);
        Thread.sleep(250);
    }
    producer.close();
}

Now this is all well and good, the problem comes when i'm trying to serialize a POJO.现在这一切都很好,当我尝试序列化 POJO 时问题就出现了。 So , i was able to get the AvroSchema from the POJO using the utility provided with Avro.因此,我能够使用 Avro 提供的实用程序从 POJO 获取 AvroSchema。 Hardcoded the schema, and then tried to create a Generic Record to send through the KafkaProducer the producer is now set up as :硬编码模式,然后尝试创建一个通用记录以通过 KafkaProducer 发送生产者现在设置为:

    Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.KafkaAvroSerializer");

Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(USER_SCHEMA); // this is the Generated AvroSchema
KafkaProducer<String, byte[]> producer = new KafkaProducer<>(props);

this is where the problem is : the moment i use KafkaAvroSerializer, the producer doesn't come up due to : missing mandatory parameter : schema.registry.url这就是问题所在:当我使用 KafkaAvroSerializer 时,生产者没有出现,因为:缺少必需参数:schema.registry.url

I read up on why this is required, so that my consumer is able to decipher whatever the producer is sending to me.我阅读了为什么需要这样做,以便我的消费者能够破译生产者发送给我的任何内容。 But isn't the schema already embedded in the AvroMessage?但是模式不是已经嵌入到 AvroMessage 中了吗? Would be really great if someone can share a working example of using KafkaProducer with the KafkaAvroSerializer without having to specify schema.registry.url如果有人可以分享一个使用 KafkaProducer 和 KafkaAvroSerializer 而不必指定 schema.registry.url 的工作示例,那就太好了

would also really appreciate any insights/resources on the utility of the schema registry.也非常感谢有关架构注册表效用的任何见解/资源。

thanks!谢谢!

Note first: KafkaAvroSerializer is not provided in vanilla apache kafka - it is provided by Confluent Platform.首先请注意: KafkaAvroSerializer未在 vanilla apache kafka 中提供 - 它由 Confluent Platform 提供。 ( https://www.confluent.io/ ), as part of its open source components ( http://docs.confluent.io/current/platform.html#confluent-schema-registry ) ( https://www.confluent.io/ ),作为其开源组件的一部分 ( http://docs.confluent.io/current/platform.html#confluent-schema-registry )

Rapid answer: no, if you use KafkaAvroSerializer , you will need a schema registry.快速回答:不,如果您使用KafkaAvroSerializer ,您将需要一个架构注册表。 See some samples here: http://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html在此处查看一些示例: http : //docs.confluent.io/current/schema-registry/docs/serializer-formatter.html

The basic idea with schema registry is that each topic will refer to an avro schema (ie, you will only be able to send data coherent with each other. But a schema can have multiple version, so you still need to identify the schema for each record)模式注册表的基本思想是每个主题都将引用一个 avro 模式(即,您只能发送彼此一致的数据。但是一个模式可以有多个版本,因此您仍然需要为每个主题识别模式记录)

We don't want to write the schema for everydata like you imply - often, schema is bigger than your data!我们不想像您暗示的那样为每个数据编写架构 - 通常,架构比您的数据大! That would be a waste of time parsing it everytime when reading, and a waste of ressources (network, disk, cpu)每次读取都浪费时间解析,浪费资源(网络,磁盘,cpu)

Instead, a schema registry instance will do a binding avro schema <-> int schemaId and the serializer will then write only this id before the data, after getting it from registry (and caching it for later use).相反,模式注册表实例将绑定avro schema <-> int schemaId ,然后序列化程序将在数据之前仅写入此 id,从注册表中获取它(并缓存它以备后用)。

So inside kafka, your record will be [<id> <bytesavro>] (and magic byte for technical reason), which is an overhead of only 5 bytes (to compare to the size of your schema) And when reading, your consumer will find the corresponding schema to the id, and deserializer avro bytes regarding it.因此,在 kafka 中,您的记录将是[<id> <bytesavro>] (出于技术原因,还有魔术字节),这只是 5 个字节的开销(与您的架构大小相比)并且在阅读时,您的消费者将找到与 id 对应的模式,以及关于它的反序列化器 avro 字节。 You can find way more in confluent doc您可以在 confluent doc 中找到更多方法

If you really have a use where you want to write the schema for every record, you will need an other serializer (I think writing your own, but it will be easy, just reuse https://github.com/confluentinc/schema-registry/blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroSerializer.java and remove the schema registry part to replace it with the schema, same for reading).如果您真的想为每条记录编写架构,则需要另一个序列化程序(我认为编写自己的,但这很容易,只需重用https://github.com/confluentinc/schema- registry/blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroSerializer.java并删除架构注册表部分以将其替换为架构,阅读相同)。 But if you use avro, I would really discourage this - one day a later, you will need to implement something like avro registry to manage versioning但是如果你使用 avro,我真的不鼓励这样做 - 一天后,你将需要实现类似 avro 注册表的东西来管理版本控制

While the checked answer is all correct, it should also be mentioned that schema registration can be disabled .虽然检查的答案都是正确的,但还应该提到可以禁用模式注册

Simply set auto.register.schemas to false .只需将auto.register.schemas设置为false

As others have pointed out, KafkaAvroSerializer requires Schema Registry which is part of Confluent platform, and usage requires licensing.正如其他人指出的那样,KafkaAvroSerializer 需要作为 Confluent 平台一部分的 Schema Registry,并且使用需要许可。

The main advantage of using the schema registry is that your bytes on wire will smaller, as opposed to writing a binary payload with schema for every message.使用架构注册表的主要优点是,您在线上的字节会更小,而不是为每条消息编写带有架构的二进制有效负载。

I wrote a blog post detailing the advantages我写了一篇博文详细介绍了优点

You can always make your value classes to implement Serialiser<T> , Deserialiser<T> (and Serde<T> for Kafka Streams) manually.你总是可以让你的值类Serialiser<T>实现Serialiser<T>Deserialiser<T> (和Serde<T> for Kafka Streams)。 Java classes are usually generated from Avro files, so editing that directly isn't a good idea, but wrapping is maybe verbose but possible way. Java 类通常是从 Avro 文件生成的,因此直接编辑不是一个好主意,但包装可能是冗长但可行的方式。

Another way is to tune Arvo generator templates that are used for Java classes generation and generate implementation of all those interfaces automatically.另一种方法是调整用于 Java 类生成的 Arvo 生成器模板,并自动生成所有这些接口的实现。 Both Avro maven and gradle plugins supports custom templates, so it should be easy to configure. Avro maven 和 gradle 插件都支持自定义模板,所以应该很容易配置。

I've created https://github.com/artemyarulin/avro-kafka-deserializable that has changed template files and simple CLI tool that you can use for file generation我创建了https://github.com/artemyarulin/avro-kafka-deserializable ,它改变了模板文件和简单的 CLI 工具,您可以使用它来生成文件

You can create your Custom Avro serialiser, then even without Schema registry you would be able to produce records to topics.您可以创建自定义 Avro 序列化器,然后即使没有架构注册表,您也可以生成主题记录。 Check below article.检查下面的文章。

https://codenotfound.com/spring-kafka-apache-avro-serializer-deserializer-example.html https://codenotfound.com/spring-kafka-apache-avro-serializer-deserializer-example.html

Here they have use Kafkatemplate .他们在这里使用了Kafkatemplate I have tried using我试过使用

KafkaProducer<String, User> UserKafkaProducer

It is working fine But if you want to use KafkaAvroSerialiser , you need to give Schema registryURL它工作正常但是如果你想使用KafkaAvroSerialiser ,你需要给 Schema registryURL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM