简体   繁体   English

Flink反序列化Kafka JSON

[英]Flink deserialize Kafka JSON

I am trying to read a json message from a kafka topic with flink.我正在尝试从带有 flink 的 kafka 主题中读取 json 消息。

I am using Kafka 2.4.1 and Flink 1.10我正在使用 Kafka 2.4.1 和 Flink 1.10

for my consumer I have set:对于我的消费者,我设置了:

import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.streaming.util.serialization.JSONKeyValueDeserializationSchema;
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ObjectNode;


FlinkKafkaConsumer<ObjectNode> sensorConsumer = new FlinkKafkaConsumer(KAFKA_TOPIC_INPUT, 
                new JSONKeyValueDeserializationSchema(false), properties);

when I use SimpleStringSchema I get the json as text which is fine but with the JSONKeyValueDeserializer I get:当我使用SimpleStringSchema时,我得到 json 作为文本,这很好,但使用 JSONKeyValueDeserializer 我得到:

Caused by: org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'sensor_5': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')

sensor_5 would be a key in the topic I am guessing that I need to add something else to get the JSON from the kafka message value fed to the serializer and handle the key somehow but I am not sure? sensor_5将是主题中的关键我猜我需要添加其他内容才能从馈送到序列化程序的 kafka 消息值中获取 JSON 并以某种方式处理密钥,但我不确定?

Any suggestions?有什么建议么?

The json structure is: json结构为:

{"value": 1.0, "timestamp": "2020-05-01 14:00:00.000000"}

and it is submitted via它是通过提交的

# Python 3
import json
from confluent_kafka import Producer

dict_obj = {"value": 1.0, "timestamp": "2020-05-01 14:00:00.000000"}
producer = Producer({'bootstrap.servers': "kafka:9092"})

producer.produce(topic='sensors-raw', key='sensor_5', value=json.dumps(dict_obj))

So, basically, if You will take a look at the source code of JSONKeyValueDeserializationSchema You can see that it looks like below:所以,基本上,如果你看一下JSONKeyValueDeserializationSchema的源代码,你可以看到它如下所示:

    if (mapper == null) {
            mapper = new ObjectMapper();
        }
        ObjectNode node = mapper.createObjectNode();
        if (record.key() != null) {
            node.set("key", mapper.readValue(record.key(), JsonNode.class));
        }
        if (record.value() != null) {
            node.set("value", mapper.readValue(record.value(), JsonNode.class));
        }
        if (includeMetadata) {
            node.putObject("metadata")
                .put("offset", record.offset())
                .put("topic", record.topic())
                .put("partition", record.partition());
        }
        return node;

So, generally the schema expects that Your key is JSON not a String, thus it will fail for sensor_5 .因此,通常模式期望您的密钥是 JSON 而不是字符串,因此对于sensor_5它将失败。 I think the best and simplest solution would be to create Your own implementation that takes String as key.我认为最好和最简单的解决方案是创建您自己的以字符串为键的实现。

You can implement DeserializationSchema instead of KeyedDeserializationSchema if you don't want to include your key in your record.如果您不想在记录中包含密钥,则可以实现DeserializationSchema而不是KeyedDeserializationSchema

An example would be like the following:一个例子如下:

public class JSONValueDeserializationSchema implements DeserializationSchema<ObjectNode> {

    private static final long serialVersionUID = -1L;

    private ObjectMapper mapper;

    @Override
    public ObjectNode deserialize(byte[] message) throws IOException {
        if (mapper == null) {
            mapper = new ObjectMapper();
        }
        ObjectNode node = mapper.createObjectNode();
        if (message != null) {
            node.set("value", mapper.readValue(message, JsonNode.class));
        }
        return node;
    }

    @Override
    public boolean isEndOfStream(ObjectNode nextElement) {
        return false;
    }

    @Override
    public TypeInformation<ObjectNode> getProducedType() {
        return getForClass(ObjectNode.class);
    }
}

If you want to include the key as well in your record, you can implement KeyedDeserializationSchema as mentioned in the answer by Dominik Wosiński.如果您想在记录中也包含密钥,您可以按照 Dominik Wosiński 的回答中提到的那样实现KeyedDeserializationSchema

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM