简体   繁体   English

从Kafka向MongDB传输数据时如何为非json格式的消息配置接收器连接器

[英]how to configure sink connector for non-json format message when streaming data from Kafka to MongDB

I can stream data from Kafka to MongoDB. 我可以将数据从Kafka流式传输到MongoDB。

I stream by kafka-console-producer.sh with a json message(for example: {"id": 1, "test": 123} ). 我通过kafka-console-producer.sh用json消息流式传输(例如: {"id": 1, "test": 123} )。 It is inserted into MongoDB successfully. 它已成功插入MongoDB中。

However, if I stream a message (for example: abc ). 但是,如果我流式传输一条消息(例如: abc )。 Then, I will get the error because of json parsing 然后,由于json解析,我将得到错误

[2019-07-09 14:44:30,365] ERROR WorkerSinkTask{id=mongo-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:487)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:464)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:320)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error:
        at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:344)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:487)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
        ... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'abc': was expecting ('true', 'false
' or 'null')
 at [Source: (byte[])"abc"; line: 1, column: 7]
Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'abc': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"abc"; line: 1, column: 7]
        at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
        at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:703)
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3532)
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2627)
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:832)
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:729)
        at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4042)
        at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2571)
        at org.apache.kafka.connect.json.JsonDeserializer.deserialize(JsonDeserializer.java:50)
        at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:342)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:487)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:487)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:464)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:320)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[2019-07-09 14:44:30,374] ERROR WorkerSinkTask{id=mongo-sink-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.Wor
kerTask)

The sink connector configuration file is MongoSinkConnector.properties with content (I create it based on this source ) 接收器连接器配置文件是具有内容的MongoSinkConnector.properties (我根据此创建它)

name=mongo-sink
topics=test
connector.class=com.mongodb.kafka.connect.MongoSinkConnector
tasks.max=1
key.ignore=true

# Specific global MongoDB Sink Connector configuration
connection.uri=mongodb://localhost:27017
database=test_kafka
collection=transaction
max.num.retries=3
retries.defer.timeout=5000
type.name=kafka-connect

key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false

I have 2 separated questions: 我有2个独立的问题:

  1. how to ignore the message which is non-json format? 如何忽略非json格式的消息?

  2. how to defined a default-key for this kind of message (for example: abc -> { "non-json": "abc" } 如何为这种消息定义默认密钥(例如: abc > { "non-json": "abc" }

A String is "JSON format", however, The JsonConverter typically expects JSON objects , or in some cases null , not plain strings. 字符串是“ JSON格式”,但是, JsonConverter通常期望JSON 对象 ,或者在某些情况下为null ,而不是纯字符串。

There is no property to set defaults for errors, but you can either simply skip the events, or send them to a different topic, which you must process & fix the messages separately if you expect them to also end up in the database. 没有属性可以设置错误的默认值,但是您可以简单地跳过事件,或将它们发送到其他主题,如果您希望它们也出现在数据库中,则必须分别处理和修复消息。 This is configured with the errors.tolerance setting (only available since Kafka Connect 2.0) 这是使用errors.tolerance设置配置的(仅自Kafka Connect 2.0起可用)

Refer - https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues 请参阅-https: //www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues

If the data is just "abc" , then I would suggest using StringConverter rather than JsonConverter anyways 如果数据只是"abc" ,那么无论如何我建议使用StringConverter而不是JsonConverter

But if you wanted to wrap all events in a JSON object (not just plain strings), you would have to add transforms property 但是,如果要将所有事件包装在JSON对象中(而不仅仅是纯字符串),则必须添加transforms属性

transforms=MakeMap 
transforms.MakeMap.type=org.apache.kafka.connect.transforms.HoistField$Value 
transforms.MakeMap.field=non-json 

This would take "abc" to {"non-json":"abc"} as well as {"id": 1, "test": 123} to {"non-json":{"id": 1, "test": 123}} , so you need to be careful about what you are producing into the topic. 这会将"abc"{"non-json":"abc"} 以及将 {"id": 1, "test": 123}{"non-json":{"id": 1, "test": 123}} ,因此您需要注意正在产生的主题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 MongoDB 接收器连接器:消息在 Apache Kafka 中被截断 - MongoDB Sink Connector : Message truncated in Apache Kafka kafka sink 连接器中无效 JSON 的错误处理 - Error handling for invalid JSON in kafka sink connector 如何将流数据从spark下沉到Mongodb? - How to sink streaming data from spark to Mongodb? 无法使用Confluent Elasticsearch Sink连接器将Kafka主题数据转换为结构化JSON - Unable to convert Kafka topic data into structured JSON with Confluent Elasticsearch sink connector mongo db sink 连接器,kafka 消息密钥到 mongodb 文档字段 - mongo db sink connector, kafka message key to mongodb document field Kafka Mongo Sink 连接器,如何将 SMT 时间戳转换器用于 JSON 数组中归档的日期 - Kafka Mongo Sink connector , how to use SMT timestampconverter for the date filed inside a JSON array 如何根据 kafka 主题名称或消息键/值使用 mongodb 接收器连接器对不同 dbs 和 collections 中的 kafka 主题进行分组 - How to group kafka topics in different dbs and collections with mongodb sink connector depending on kafka topic name or message key/value 连接Mongodb sink连接器和kafka - Connecting Mongodb sink connector and kafka 如何通过 Kafka Connector 将数据从 Kafka 流式传输到 MongoDB - How to stream data from Kafka to MongoDB by Kafka Connector kafka mongodb接收器连接器未启动 - kafka mongodb sink connector not starting
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM