[英]Pulsar function fails to deserialize message because of wrong schema type (JSON instead of AVRO)
When running Pulsar in docker as standalone, we are facing this weird issue when deserializing the message in the specific case.当在 docker 中独立运行 Pulsar 时,在特定情况下反序列化消息时,我们会面临这个奇怪的问题。 We are using version 2.7.1.
我们使用的是 2.7.1 版。
We have a script creating topics and functions after which schema gets created for troublesome topic with type JSON.我们有一个创建主题和函数的脚本,然后为 JSON 类型的麻烦主题创建模式。 The whole schema is correct, but the type is not.
整个模式是正确的,但类型不是。 This is all before sending any messages.
这是发送任何消息之前的全部内容。 We also enabled
set-is-allow-auto-update-schema
.我们还启用了
set-is-allow-auto-update-schema
。
This, let's call it trouble-topic
, is populated from 2 sources: ValidationFunction
and a Spring Boot microservice.我们称之为
trouble-topic
,它由 2 个来源填充: ValidationFunction
和 Spring Boot 微服务。
ValidationFunction
validates the message and if the message is valid it sends the mapped message to a topic which is consumed by Spring Boot microservice which then does some logic on it and sends it to trouble-topic
, but if validation fails it sends message directly to trouble-topic
. ValidationFunction
验证消息,如果消息有效,则将映射的消息发送到 Spring Boot 微服务使用的主题,然后在其上执行一些逻辑并将其发送到trouble-topic
,但如果验证失败,则直接向trouble-topic
发送消息trouble-topic
。
When using sendAsync
from Spring Boot microservice with the following producer, schema gets updated, has AVRO as a type, and TroubleFunction
reading the trouble-topic
works fine afterwards:当使用 Spring Boot 微服务的
sendAsync
和以下生产者时,模式会更新,将 AVRO 作为类型,并且TroubleFunction
读取trouble-topic
之后工作正常:
pulsarClient
.newProducer(AvroSchema.of(OurClass.class))
.topic(troubleTopicName))
.create()
But if before that some messages fail validation, and the messages are sent directly to the trouble-topic
before the above Producer is used, we get a parsing exception.但是如果在此之前一些消息验证失败,并且在使用上述 Producer 之前将消息直接发送到
trouble-topic
,我们会得到解析异常。 We send the message from function in the following way:我们通过以下方式从函数发送消息:
context.newOutputMessage(troubleTopicName, AvroSchema.of(OurClass.class))
.value(value)
.sendAsync();
This does not update the schema type for some reason and the schema type is still JSON.由于某种原因,这不会更新架构类型,并且架构类型仍然是 JSON。 I validated schema type on each of the steps using pulsar admin CLI.
我使用 pulsar admin CLI 验证了每个步骤的模式类型。 And when this happens before the microservice producer updates the schema type for the first time,
TroubleFunction
reading the trouble-topic
fails with the following error:当这种情况发生在微服务生产者第一次更新模式类型之前,
TroubleFunction
读取trouble-topic
失败并出现以下错误:
11:43:49.322 [tenant/namespace/TroubleFunction-0] ERROR org.apache.pulsar.functions.instance.JavaInstanceRunnable - [tenant/namespace/TroubleFunction:0] Uncaught exception in Java Instance
org.apache.pulsar.client.api.SchemaSerializationException: com.fasterxml.jackson.core.JsonParseException: Illegal character ((CTRL-CHAR, code 2)): only regular white space (\r, \n, \t) is allowed between tokens
at [Source: (byte[])avro-serialized-msg-i-have-to-hide Parsing exception: cvc-complex-type.2.4.a: Invalid content was found starting with element 'ElementName'. One of '{"foo:bar":ElementName}' is expected."; line: 1, column: 2]
So my question is what is the difference between these two, and why sending the message from function does not update the schema type correctly?所以我的问题是这两者之间有什么区别,为什么从函数发送消息没有正确更新架构类型? Is it not using the same Producer underneath?
下面不是使用同一个Producer吗? Also is there a way to fix this so that schema type is set on initialization or at least updated when the message is sent from a function?
还有没有办法解决这个问题,以便在初始化时设置模式类型或至少在从函数发送消息时更新?
First of all, credit where credit is due.首先,信用到期的信用。 I suppose this will be well documented one day, but right now it is not.
我想有一天这会被很好地记录下来,但现在不是。 I was fortunate enough to have an EAP version of Apache Pulsar in Action book where this example repository is being used to showcase some Pulsar functionality: https://github.com/david-streamlio/GottaEat
我很幸运有一个 EAP 版本的 Apache Pulsar in Action 书,这个示例存储库用于展示一些 Pulsar 功能: https : //github.com/david-streamlio/GottaEat
I highly recommend the book and going through those examples for everyone working with Pulsar, there was some mention on pulsar slack community that just yesterday it graduated from MEAP and it should be available in print edition as well rather soon so check it out.我强烈推荐这本书,并为每个与 Pulsar 一起工作的人阅读这些例子,在 pulsar slack 社区上有人提到它昨天刚从 MEAP 毕业,它应该很快就会有印刷版,所以看看吧。 Also consider joining Pulsar slack as well.
也可以考虑加入 Pulsar slack。
Answer:回答:
This is the piece of code that allowed me to understand how this is supposed to work:这是一段代码,让我了解它应该如何工作:
Map<String, ConsumerConfig> inputSpecs = new HashMap<String, ConsumerConfig> ();
inputSpecs.put("persistent://orders/inbound/food-orders",
ConsumerConfig.builder().schemaType("avro").build());
FunctionConfig functionConfig =
FunctionConfig.builder()
...
.inputSpecs(inputSpecs)
...
.build();
Java code can be used to setup the function when using LocalRunner, but the same configuration can be achieved using pulsar admin cli (which we use) and rest api.使用LocalRunner时可以使用Java代码来设置该功能,但使用pulsar admin cli(我们使用的)和rest api可以实现相同的配置。 You can use functions config file as well and specify it in the following way in the configuration yaml:
您也可以使用函数配置文件,并在配置 yaml 中按以下方式指定它:
inputSpecs:
$topicName:
schemaType: AVRO
$topicName
is like always in the following format: persistent://tenant/namespace/topic
$topicName
始终采用以下格式: persistent://tenant/namespace/topic
Once you specify input specs for, in my case, TroubleFunction
, the schema will be validly created with correct schema type and deserialization will work perfectly fine as well.一旦您为
TroubleFunction
指定输入规范,将使用正确的模式类型有效地创建模式,反序列化也将正常工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.