[英]Flink deserialize Kafka JSON
I am trying to read a json message from a kafka topic with flink.我正在尝试从带有 flink 的 kafka 主题中读取 json 消息。
I am using Kafka 2.4.1 and Flink 1.10我正在使用 Kafka 2.4.1 和 Flink 1.10
for my consumer I have set:对于我的消费者,我设置了:
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.streaming.util.serialization.JSONKeyValueDeserializationSchema;
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ObjectNode;
FlinkKafkaConsumer<ObjectNode> sensorConsumer = new FlinkKafkaConsumer(KAFKA_TOPIC_INPUT,
new JSONKeyValueDeserializationSchema(false), properties);
when I use SimpleStringSchema
I get the json as text which is fine but with the JSONKeyValueDeserializer I get:当我使用
SimpleStringSchema
时,我得到 json 作为文本,这很好,但使用 JSONKeyValueDeserializer 我得到:
Caused by: org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'sensor_5': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
sensor_5
would be a key in the topic I am guessing that I need to add something else to get the JSON from the kafka message value fed to the serializer and handle the key somehow but I am not sure? sensor_5
将是主题中的关键我猜我需要添加其他内容才能从馈送到序列化程序的 kafka 消息值中获取 JSON 并以某种方式处理密钥,但我不确定?
Any suggestions?有什么建议么?
The json structure is: json结构为:
{"value": 1.0, "timestamp": "2020-05-01 14:00:00.000000"}
and it is submitted via它是通过提交的
# Python 3
import json
from confluent_kafka import Producer
dict_obj = {"value": 1.0, "timestamp": "2020-05-01 14:00:00.000000"}
producer = Producer({'bootstrap.servers': "kafka:9092"})
producer.produce(topic='sensors-raw', key='sensor_5', value=json.dumps(dict_obj))
So, basically, if You will take a look at the source code of JSONKeyValueDeserializationSchema
You can see that it looks like below:所以,基本上,如果你看一下
JSONKeyValueDeserializationSchema
的源代码,你可以看到它如下所示:
if (mapper == null) {
mapper = new ObjectMapper();
}
ObjectNode node = mapper.createObjectNode();
if (record.key() != null) {
node.set("key", mapper.readValue(record.key(), JsonNode.class));
}
if (record.value() != null) {
node.set("value", mapper.readValue(record.value(), JsonNode.class));
}
if (includeMetadata) {
node.putObject("metadata")
.put("offset", record.offset())
.put("topic", record.topic())
.put("partition", record.partition());
}
return node;
So, generally the schema expects that Your key is JSON not a String, thus it will fail for sensor_5
.因此,通常模式期望您的密钥是 JSON 而不是字符串,因此对于
sensor_5
它将失败。 I think the best and simplest solution would be to create Your own implementation that takes String as key.我认为最好和最简单的解决方案是创建您自己的以字符串为键的实现。
You can implement DeserializationSchema
instead of KeyedDeserializationSchema
if you don't want to include your key in your record.如果您不想在记录中包含密钥,则可以实现
DeserializationSchema
而不是KeyedDeserializationSchema
。
An example would be like the following:一个例子如下:
public class JSONValueDeserializationSchema implements DeserializationSchema<ObjectNode> {
private static final long serialVersionUID = -1L;
private ObjectMapper mapper;
@Override
public ObjectNode deserialize(byte[] message) throws IOException {
if (mapper == null) {
mapper = new ObjectMapper();
}
ObjectNode node = mapper.createObjectNode();
if (message != null) {
node.set("value", mapper.readValue(message, JsonNode.class));
}
return node;
}
@Override
public boolean isEndOfStream(ObjectNode nextElement) {
return false;
}
@Override
public TypeInformation<ObjectNode> getProducedType() {
return getForClass(ObjectNode.class);
}
}
If you want to include the key as well in your record, you can implement KeyedDeserializationSchema
as mentioned in the answer by Dominik Wosiński.如果您想在记录中也包含密钥,您可以按照 Dominik Wosiński 的回答中提到的那样实现
KeyedDeserializationSchema
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.