简体   繁体   English

由于模式类型错误(JSON 而不是 AVRO),Pulsar 函数无法反序列化消息

[英]Pulsar function fails to deserialize message because of wrong schema type (JSON instead of AVRO)

When running Pulsar in docker as standalone, we are facing this weird issue when deserializing the message in the specific case.当在 docker 中独立运行 Pulsar 时,在特定情况下反序列化消息时,我们会面临这个奇怪的问题。 We are using version 2.7.1.我们使用的是 2.7.1 版。

We have a script creating topics and functions after which schema gets created for troublesome topic with type JSON.我们有一个创建主题和函数的脚本,然后为 JSON 类型的麻烦主题创建模式。 The whole schema is correct, but the type is not.整个模式是正确的,但类型不是。 This is all before sending any messages.这是发送任何消息之前的全部内容。 We also enabled set-is-allow-auto-update-schema .我们还启用了set-is-allow-auto-update-schema

This, let's call it trouble-topic , is populated from 2 sources: ValidationFunction and a Spring Boot microservice.我们称之为trouble-topic ,它由 2 个来源填充: ValidationFunction和 Spring Boot 微服务。

ValidationFunction validates the message and if the message is valid it sends the mapped message to a topic which is consumed by Spring Boot microservice which then does some logic on it and sends it to trouble-topic , but if validation fails it sends message directly to trouble-topic . ValidationFunction验证消息,如果消息有效,则将映射的消息发送到 Spring Boot 微服务使用的主题,然后在其上执行一些逻辑并将其发送到trouble-topic ,但如果验证失败,则直接向trouble-topic发送消息trouble-topic

When using sendAsync from Spring Boot microservice with the following producer, schema gets updated, has AVRO as a type, and TroubleFunction reading the trouble-topic works fine afterwards:当使用 Spring Boot 微服务的sendAsync和以下生产者时,模式会更新,将 AVRO 作为类型,并且TroubleFunction读取trouble-topic之后工作正常:

pulsarClient
    .newProducer(AvroSchema.of(OurClass.class))
    .topic(troubleTopicName))
    .create()

But if before that some messages fail validation, and the messages are sent directly to the trouble-topic before the above Producer is used, we get a parsing exception.但是如果在此之前一些消息验证失败,并且在使用上述 Producer 之前将消息直接发送到trouble-topic ,我们会得到解析异常。 We send the message from function in the following way:我们通过以下方式从函数发送消息:

context.newOutputMessage(troubleTopicName, AvroSchema.of(OurClass.class))
    .value(value)
    .sendAsync();

This does not update the schema type for some reason and the schema type is still JSON.由于某种原因,这不会更新架构类型,并且架构类型仍然是 JSON。 I validated schema type on each of the steps using pulsar admin CLI.我使用 pulsar admin CLI 验证了每个步骤的模式类型。 And when this happens before the microservice producer updates the schema type for the first time, TroubleFunction reading the trouble-topic fails with the following error:当这种情况发生在微服务生产者第一次更新模式类型之前, TroubleFunction读取trouble-topic失败并出现以下错误:

11:43:49.322 [tenant/namespace/TroubleFunction-0] ERROR org.apache.pulsar.functions.instance.JavaInstanceRunnable - [tenant/namespace/TroubleFunction:0] Uncaught exception in Java Instance
org.apache.pulsar.client.api.SchemaSerializationException: com.fasterxml.jackson.core.JsonParseException: Illegal character ((CTRL-CHAR, code 2)): only regular white space (\r, \n, \t) is allowed between tokens
 at [Source: (byte[])avro-serialized-msg-i-have-to-hide Parsing exception: cvc-complex-type.2.4.a: Invalid content was found starting with element 'ElementName'. One of '{"foo:bar":ElementName}' is expected."; line: 1, column: 2]

So my question is what is the difference between these two, and why sending the message from function does not update the schema type correctly?所以我的问题是这两者之间有什么区别,为什么从函数发送消息没有正确更新架构类型? Is it not using the same Producer underneath?下面不是使用同一个Producer吗? Also is there a way to fix this so that schema type is set on initialization or at least updated when the message is sent from a function?还有没有办法解决这个问题,以便在初始化时设置模式类型或至少在从函数发送消息时更新?

First of all, credit where credit is due.首先,信用到期的信用。 I suppose this will be well documented one day, but right now it is not.我想有一天这会被很好地记录下来,但现在不是。 I was fortunate enough to have an EAP version of Apache Pulsar in Action book where this example repository is being used to showcase some Pulsar functionality: https://github.com/david-streamlio/GottaEat我很幸运有一个 EAP 版本的 Apache Pulsar in Action 书,这个示例存储库用于展示一些 Pulsar 功能: https : //github.com/david-streamlio/GottaEat

I highly recommend the book and going through those examples for everyone working with Pulsar, there was some mention on pulsar slack community that just yesterday it graduated from MEAP and it should be available in print edition as well rather soon so check it out.我强烈推荐这本书,并为每个与 Pulsar 一起工作的人阅读这些例子,在 pulsar slack 社区上有人提到它昨天刚从 MEAP 毕业,它应该很快就会有印刷版,所以看看吧。 Also consider joining Pulsar slack as well.也可以考虑加入 Pulsar slack。


Answer:回答:

This is the piece of code that allowed me to understand how this is supposed to work:这是一段代码,让我了解它应该如何工作:

Map<String, ConsumerConfig> inputSpecs = new HashMap<String, ConsumerConfig> ();
inputSpecs.put("persistent://orders/inbound/food-orders", 
    ConsumerConfig.builder().schemaType("avro").build());
FunctionConfig functionConfig = 
    FunctionConfig.builder()
        ...
        .inputSpecs(inputSpecs)
        ...
        .build();

Java code can be used to setup the function when using LocalRunner, but the same configuration can be achieved using pulsar admin cli (which we use) and rest api.使用LocalRunner时可以使用Java代码来设置该功能,但使用pulsar admin cli(我们使用的)和rest api可以实现相同的配置。 You can use functions config file as well and specify it in the following way in the configuration yaml:您也可以使用函数配置文件,并在配置 yaml 中按以下方式指定它:

inputSpecs:
 $topicName:
  schemaType: AVRO

$topicName is like always in the following format: persistent://tenant/namespace/topic $topicName始终采用以下格式: persistent://tenant/namespace/topic

Once you specify input specs for, in my case, TroubleFunction , the schema will be validly created with correct schema type and deserialization will work perfectly fine as well.一旦您为TroubleFunction指定输入规范,将使用正确的模式类型有效地创建模式,反序列化也将正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Avro无法使用更新的架构反序列化消息 - Avro fails to deserialize message with updated schema Kafka Avro 使用模式注册表将序列化/反序列化为具体类型失败 - Kafka Avro serialize/deserialize into concrete type using schema registry failing 如何将JSON消息转换为具有可空字段的avro架构的有效JSON? - How to transform JSON message to valid JSON for avro schema with nullable fields? Avro 架构对象通用 kafkaTemplate 类型 - 生成消息 - Avro schema object generic kafkaTemplate type - produce message 无法使用Confluent Schema Registry和Spring Cloud Streams反序列化Avro消息 - Unable to deserialize Avro message using Confluent Schema Registry and Spring Cloud Stream Apache Kafka Avro 反序列化:无法反序列化或解码特定类型的消息。 - Apache Kafka Avro Deserialization: Unable to deserialize or decode Specific type message. 反序列化 JSON 和 Avro 无 Schema - Deserialization of JSON and Avro without Schema 通用类型的 AVRO 模式 - Java - AVRO schema for generic type - Java Kafka Avro 反序列化器无法将 Kafka 消息反序列化为特定的 Avro 记录 - Kafka Avro Deserializer is not able to deserialize the Kafka message to Specific Avro Record 如何将avro模式正确转换为json模式 - how to properly convert avro schema into a json schema
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM