简体   繁体   English

Kafka Streams创建没有架构的Avro主题

[英]Kafka Streams creates avro topic without schema

I developed a java application which reads data from an avro topic, using Schema Registry, then makes simple transformations and prints the result in the console. 我开发了一个Java应用程序,该应用程序使用Schema Registry从avro主题中读取数据,然后进行简单的转换并将结果打印在控制台中。 By default I used GenericAvroSerde class for keys and values. 默认情况下,我将GenericAvroSerde类用于键和值。 Everything worked fine except that I had to define additionally configuration for each serde like 一切工作正常,除了我必须为每个serde定义其他配置,例如

    final Map<String, String> serdeConfig = Collections.singletonMap("schema.registry.url", kafkaStreamsConfig.getProperty("schema.registry.url"));
    final Serde<GenericRecord> keyGenericAvroSerde = new GenericAvroSerde();
    final Serde<GenericRecord> valueGenericAvroSerde = new GenericAvroSerde();
    keyGenericAvroSerde.configure(serdeConfig, true);
    valueGenericAvroSerde.configure(serdeConfig, false);

Without that I always get an error like: 没有它,我总是会收到类似的错误:

Exception in thread "NTB27821-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Failed to deserialize value for record. topic=CH-PGP-LP2_S20-002_agg, partition=0, offset=4482940
at org.apache.kafka.streams.processor.internals.SourceNodeRecordDeserializer.deserialize(SourceNodeRecordDeserializer.java:46)
at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:84)
at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:117)
at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:474)
at org.apache.kafka.streams.processor.internals.StreamThread.addRecordsToTasks(StreamThread.java:642)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:548)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:519)
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 69
Caused by: java.lang.NullPointerException
    at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:122)
    at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:93)
    at io.confluent.kafka.serializers.KafkaAvroDeserializer.deserialize(KafkaAvroDeserializer.java:55)
    at io.confluent.kafka.streams.serdes.avro.GenericAvroDeserializer.deserialize(GenericAvroDeserializer.java:63)
    at io.confluent.kafka.streams.serdes.avro.GenericAvroDeserializer.deserialize(GenericAvroDeserializer.java:39)
    at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:65)
    at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:55)
    at org.apache.kafka.streams.processor.internals.SourceNode.deserializeValue(SourceNode.java:56)
    at org.apache.kafka.streams.processor.internals.SourceNodeRecordDeserializer.deserialize(SourceNodeRecordDeserializer.java:44)
    at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:84)
    at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:117)
    at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:474)
    at org.apache.kafka.streams.processor.internals.StreamThread.addRecordsToTasks(StreamThread.java:642)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:548)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:519)

Well, it was unsual, but fine, after that (when I added configuration call as I posted above) - it worked and my application was able to to all the operations and print out the result. 好吧,那是不寻常的,但是那之后没事(当我在上面发布的内容中添加了配置调用时)-它可以正常工作,并且我的应用程序能够执行所有操作并打印出结果。

But! 但! When I tried to use call through() - just to post data to the new topic - I faced the problem I am asking about: TOPIC WAS CREATED WITHOUT A SCHEMA. 当我尝试使用call through()-只是将数据发布到新主题时-我遇到了我要问的问题:主题创建时没有模式。 How it can be??? 怎么可能???

Interesting fact is that the data is being written, but it is: a) in binary format, so simple consumer cannot read it b) it has not a schema - so avro consumer can't read it either: 有趣的事实是正在写入数据,但是它是:a)以二进制格式,因此简单的使用者无法读取它b)它没有架构-因此avro使用者也无法读取它:

    Processed a total of 1 messages
[2017-10-05 11:25:53,241] ERROR Unknown error when running consumer:  (kafka.tools.ConsoleConsumer$:105)
org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 0
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
        at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:182)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:203)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:379)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:372)
        at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:65)
        at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndId(CachedSchemaRegistryClient.java:131)
        at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:122)
        at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:93)
        at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:122)
        at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:114)
        at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:140)
        at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:78)
        at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:53)
        at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
[2017-10-05 11:25:53,241] ERROR Unknown error when running consumer:  (kafka.tools.ConsoleConsumer$:105)
org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 0
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
        at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:182)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:203)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:379)
        at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:372)
        at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:65)
        at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndId(CachedSchemaRegistryClient.java:131)
        at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:122)
        at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:93)
        at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:122)
        at io.confluent.kafka.formatter.AvroMessageFormatter.writeTo(AvroMessageFormatter.java:114)
        at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:140)
        at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:78)
        at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:53)
        at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)

Of course I checked out the schema registry for the subject: 当然,我检查了主题的模式注册表:

curl -X GET http://localhost:8081/subjects/agg_value_9-value/versions
{"error_code":40401,"message":"Subject not found."}

But the same call to another topic written by Java App - producer of the initial data shows that schema exist: 但是对Java App编写的另一个主题的调用-初始数据的生成者表明该模式存在:

curl -X GET http://localhost:8081/subjects/CH-PGP-LP2_S20-002_agg-value/versions
[1]

Both applications use identical "schema.registry.url" configuration Just to summarize - topic is created, data is written, can be read with simple consumer, but it is binary and the schema doesn't exist. 这两个应用程序都使用相同的“ schema.registry.url”配置。总结一下,创建了主题,编写了数据,可以用简单的使用者读取它,但是它是二进制的,模式不存在。

Also I tried to create a schema with a Landoop, somehow to match the data, but no success - and by the way it is not a proper way to use kafka streams - everything should be done on the fly. 我也尝试用Landoop创建一个模式,以某种方式匹配数据,但是没有成功-顺便说一下,这不是使用kafka流的正确方法-一切都应该即时完成。

Help, please! 请帮助!

When through is called, the default serde defined via StreamsConfig is used unless users specifically overrides it. 调用through ,将使用通过StreamsConfig定义的默认Serde,除非用户明确将其覆盖。 Which default serde did you use? 您使用了哪个默认Serde? To be correct you should be using the AbstractKafkaAvroSerializer which will automatically register the schema for that through topic. 为了正确起见,您应该使用AbstractKafkaAvroSerializer,它将通过主题自动为其注册模式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM