[英]how to configure sink connector for non-json format message when streaming data from Kafka to MongDB
I can stream data from Kafka to MongoDB. 我可以将数据从Kafka流式传输到MongoDB。
I stream by kafka-console-producer.sh
with a json message(for example: {"id": 1, "test": 123}
). 我通过
kafka-console-producer.sh
用json消息流式传输(例如: {"id": 1, "test": 123}
)。 It is inserted into MongoDB successfully. 它已成功插入MongoDB中。
However, if I stream a message (for example: abc
). 但是,如果我流式传输一条消息(例如:
abc
)。 Then, I will get the error because of json parsing 然后,由于json解析,我将得到错误
[2019-07-09 14:44:30,365] ERROR WorkerSinkTask{id=mongo-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:487)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:464)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:320)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error:
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:344)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:487)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'abc': was expecting ('true', 'false
' or 'null')
at [Source: (byte[])"abc"; line: 1, column: 7]
Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'abc': was expecting ('true', 'false' or 'null')
at [Source: (byte[])"abc"; line: 1, column: 7]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:703)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3532)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2627)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:832)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:729)
at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4042)
at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2571)
at org.apache.kafka.connect.json.JsonDeserializer.deserialize(JsonDeserializer.java:50)
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:342)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:487)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:487)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:464)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:320)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2019-07-09 14:44:30,374] ERROR WorkerSinkTask{id=mongo-sink-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.Wor
kerTask)
The sink connector configuration file is MongoSinkConnector.properties
with content (I create it based on this source ) 接收器连接器配置文件是具有内容的
MongoSinkConnector.properties
(我根据此源创建它)
name=mongo-sink
topics=test
connector.class=com.mongodb.kafka.connect.MongoSinkConnector
tasks.max=1
key.ignore=true
# Specific global MongoDB Sink Connector configuration
connection.uri=mongodb://localhost:27017
database=test_kafka
collection=transaction
max.num.retries=3
retries.defer.timeout=5000
type.name=kafka-connect
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false
I have 2 separated questions: 我有2个独立的问题:
how to ignore the message which is non-json format? 如何忽略非json格式的消息?
how to defined a default-key for this kind of message (for example: abc
-> { "non-json": "abc" }
如何为这种消息定义默认密钥(例如:
abc
> { "non-json": "abc" }
A String is "JSON format", however, The JsonConverter
typically expects JSON objects , or in some cases null
, not plain strings. 字符串是“ JSON格式”,但是,
JsonConverter
通常期望JSON 对象 ,或者在某些情况下为null
,而不是纯字符串。
There is no property to set defaults for errors, but you can either simply skip the events, or send them to a different topic, which you must process & fix the messages separately if you expect them to also end up in the database. 没有属性可以设置错误的默认值,但是您可以简单地跳过事件,或将它们发送到其他主题,如果您希望它们也出现在数据库中,则必须分别处理和修复消息。 This is configured with the
errors.tolerance
setting (only available since Kafka Connect 2.0) 这是使用
errors.tolerance
设置配置的(仅自Kafka Connect 2.0起可用)
Refer - https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues 请参阅-https: //www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues
If the data is just "abc"
, then I would suggest using StringConverter
rather than JsonConverter
anyways 如果数据只是
"abc"
,那么无论如何我建议使用StringConverter
而不是JsonConverter
But if you wanted to wrap all events in a JSON object (not just plain strings), you would have to add transforms
property 但是,如果要将所有事件包装在JSON对象中(而不仅仅是纯字符串),则必须添加
transforms
属性
transforms=MakeMap
transforms.MakeMap.type=org.apache.kafka.connect.transforms.HoistField$Value
transforms.MakeMap.field=non-json
This would take "abc"
to {"non-json":"abc"}
as well as {"id": 1, "test": 123}
to {"non-json":{"id": 1, "test": 123}}
, so you need to be careful about what you are producing into the topic. 这会将
"abc"
为{"non-json":"abc"}
以及将 {"id": 1, "test": 123}
为{"non-json":{"id": 1, "test": 123}}
,因此您需要注意正在产生的主题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.