简体   繁体   English

Kafka 消息包含控制字符(MongoDB 源连接器)

[英]Kafka message includes control characters (MongoDB Source Connector)

I'm have a Kafka Connect MongoDB Source Connector (both via Confluent Platform) working but the messages it creates contain a control character at the start, which makes downstream parsing (to JSON) of this message harder than I imagine it should be.我有一个 Kafka Connect MongoDB 源连接器(均通过 Confluent 平台)工作,但它创建的消息在开始时包含一个控制字符,这使得该消息的下游解析(到 JSON)比我想象的更难。

The Source connector that's running:正在运行的源连接器:

{
    "name": "mongo-source-connector",
    "config": {
        "connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
        "connection.uri": "mongodb://myUsername:myPassword@my-mongodb-host-address:27017",
        "database": "myDatabase",
        "collection": "myCollection",
        "change.stream.full.document": "updateLookup",
        "errors.log.enable": true
    }
}

The message created in the Kafka topic by this Source connector is as follows (notice the leading control character):此 Source 连接器在 Kafka 主题中创建的消息如下(注意前导控制字符):

�{"_id": {"_data": "82609E8726000000012B022C0100296E5A1004BE208B099BCF4106822DE274B0B9D39A46645F69640064609E87267125D17D12D180620004"}, "operationType": "insert", "clusterTime": {"$timestamp": {"t": 1621002022, "i": 1}}, "fullDocument": {"_id": {"$oid": "609e87267125d17d12d18062"}, "uuid": "23534a5c-ad82-431c-a821-6b4aed4f59a1", "endingNumber": 10}, "ns": {"db": "myDatabase", "coll": "myCollection"}, "documentKey": {"_id": {"$oid": "609e87267125d17d12d18062"}}}

The control character makes downstream parsing to JSON difficult because it makes the otherwise valid JSON invalid.控制字符使下游解析到 JSON 变得困难,因为它使原本有效的 JSON 无效。 I don't know why it's there or how to get rid of it.我不知道它为什么存在或如何摆脱它。

I could, I guess, parse out junk like this control character prior to treating it like JSON but that seems like a band-aid I'd like to avoid.我想,我可以在像 JSON 一样对待它之前解析出像这个控制字符这样的垃圾,但这似乎是我想避免的创可贴。

The way I'm treating the message now, which I think is irrelevant since I've tested that it works with valid JSON without the control character, is as follows in case it matters:我现在处理消息的方式,我认为这是无关紧要的,因为我已经测试过它可以在没有控制字符的情况下与有效的 JSON 一起使用,如果它很重要,如下所示:


data class MyChangesetMessageId (
    @JsonProperty("_data")
    val data: String
)

data class MyChangesetMessageTimestamp (
    val t: Long,
    val i: Int
)

data class MyChangesetMessageClusterTime (
    @JsonProperty("\$timestamp")
    val timestamp: MyChangesetMessageTimestamp
)

data class MyChangesetOid (
    @JsonProperty("\$oid")
    val oid: String
)

data class MyChangesetMessageFullDocument (
    @JsonProperty("_id")
    val id: MyChangesetOid,
    val uuid: String,
    val endingNumber: Int
)

data class MyChangesetMessageNS (
    val db: String,
    val coll: String
)

data class MyChangesetDocumentKey (
    @JsonProperty("_id")
    val id: MyChangesetOid
)

data class MyChangesetMessage (
    @JsonProperty("_id")
    val id: MyChangesetMessageId,
    val operationType: String,
    val clusterTime: MyChangesetMessageClusterTime,
    val fullDocument: MyChangesetMessageFullDocument,
    val ns: MyChangesetMessageNS,
    val documentKey: MyChangesetDocumentKey
)

...

val objectMapper = jacksonObjectMapper()
val changesetMessage = objectMapper.readValue(message, MyChangesetMessage::class.java)

Any ideas are appreciated.任何想法表示赞赏。

The character you're referring to is typically common with Avro serialized data that's been decoded into a string.您所指的字符通常与已解码为字符串的 Avro 序列化数据常见。

Check the key/value converter settings in the Connect worker since you've not defined it in the Connector.检查 Connect 工作程序中的键/值转换器设置,因为您尚未在连接器中定义它。

If you want to parse to JSON, use the JSONConverter, otherwise Avro would work as well if you want to skip data class definitions and generate that from the Avro schema如果您想解析为 JSON,请使用 JSONConverter,否则如果您想跳过数据 class 定义并从 Avro 模式生成它,Avro 也可以工作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM