简体   繁体   English

如何通过 Kafka Connector 将数据从 Kafka 流式传输到 MongoDB

[英]How to stream data from Kafka to MongoDB by Kafka Connector

I want to stream data from Kafka to MongoDB by using Kafka Connector.我想使用 Kafka Connector 将数据从 Kafka 流式传输到 MongoDB。 I found this one https://github.com/hpgrahsl/kafka-connect-mongodb .我找到了这个https://github.com/hpgrahsl/kafka-connect-mongodb But there is no step to do.但是没有步骤可做。

After googling, it seems to lead to Confluent Platform what I don't want to use.谷歌搜索后,它似乎导致了我不想使用的Confluent Platform。

Could anyone share me document/guideline how to use kafka-connect-mongodb without using Confluent Platform or another Kafka Connector to stream data from Kafka to MongoDB?任何人都可以分享我的文档/指南,如何在不使用 Confluent 平台或其他 Kafka 连接器的情况下使用kafka-connect-mongodb将数据从 Kafka 流式传输到 MongoDB?

Thank you in advance.先感谢您。


What I tried我试过的

Step1: I download mongo-kafka-connect-0.1-all.jar from maven central Step1:我从maven central下载mongo-kafka-connect-0.1-all.jar

Step2: copy jar file to a new folder plugins inside kafka (I use Kafka on Windows, so the directory is D:\\git\\1.libraries\\kafka_2.12-2.2.0\\plugins ) Step2:将jar文件复制到kafka里面的一个新文件夹plugins (我在windows上用的是Kafka,所以目录是D:\\git\\1.libraries\\kafka_2.12-2.2.0\\plugins

Step3: Edit file connect-standalone.properties by adding a new line plugin.path=/git/1.libraries/kafka_2.12-2.2.0/plugins步骤3:编辑文件connect-standalone.properties通过添加一个新行plugin.path=/git/1.libraries/kafka_2.12-2.2.0/plugins

Step4: I add new config file for mongoDB sink MongoSinkConnector.properties步骤 4:我为 mongoDB sink MongoSinkConnector.properties添加新的配置文件

name=mongo-sink
topics=test
connector.class=com.mongodb.kafka.connect.MongoSinkConnector
tasks.max=1
key.ignore=true

# Specific global MongoDB Sink Connector configuration
connection.uri=mongodb://localhost:27017,mongo1:27017,mongo2:27017,mongo3:27017
database=test_kafka
collection=transaction
max.num.retries=3
retries.defer.timeout=5000
type.name=kafka-connect

Step5: run command bin\\windows\\connect-standalone.bat config\\connect-standalone.properties config\\MongoSinkConnector.properties Step5:运行命令bin\\windows\\connect-standalone.bat config\\connect-standalone.properties config\\MongoSinkConnector.properties

But, I get the error但是,我得到了错误

[2019-07-09 10:19:09,466] WARN The configuration 'offset.flush.interval.ms' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)
[2019-07-09 10:19:09,467] WARN The configuration 'key.converter.schemas.enable' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)
[2019-07-09 10:19:09,467] WARN The configuration 'offset.storage.file.filename' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)
[2019-07-09 10:19:09,468] WARN The configuration 'value.converter.schemas.enable' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)
[2019-07-09 10:19:09,469] WARN The configuration 'plugin.path' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)
[2019-07-09 10:19:09,469] WARN The configuration 'value.converter' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)
[2019-07-09 10:19:09,470] WARN The configuration 'key.converter' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)
Jul 09, 2019 10:19:10 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource will be ignored.
Jul 09, 2019 10:19:10 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.RootResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.RootResource will be ignored.
Jul 09, 2019 10:19:10 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource will be ignored.
Jul 09, 2019 10:19:11 AM org.glassfish.jersey.internal.Errors logErrors
WARNING: The following warnings have been detected: WARNING: The (sub)resource method listConnectors in org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains empty path annotation.
WARNING: The (sub)resource method createConnector in org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains empty path annotation.
WARNING: The (sub)resource method listConnectorPlugins in org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource contains empty path annotation.
WARNING: The (sub)resource method serverInfo in org.apache.kafka.connect.runtime.rest.resources.RootResource contains empty path annotation.

[2019-07-09 10:19:12,302] ERROR WorkerSinkTask{id=mongo-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:487)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:464)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:320)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error:
        at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:344)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:487)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
        ... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'this': was expecting 'null', 'true', 'false' or NaN
 at [Source: (byte[])"this is a message"; line: 1, column: 6]
Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'this': was expecting 'null', 'true', 'false' or NaN
 at [Source: (byte[])"this is a message"; line: 1, column: 6]
        at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
        at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:703)
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3532)
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3508)
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._matchToken2(UTF8StreamJsonParser.java:2843)
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._matchTrue(UTF8StreamJsonParser.java:2777)
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:807)
        at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:729)
        at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4042)
        at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2571)
        at org.apache.kafka.connect.json.JsonDeserializer.deserialize(JsonDeserializer.java:50)
        at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:342)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:487)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:487)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:464)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:320)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[2019-07-09 10:19:12,305] ERROR WorkerSinkTask{id=mongo-sink-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)

What configuration did I set wrong or I miss anything?我设置错了什么配置或者我错过了什么?


I fixed it.我修好了它。 Now, I can stream data from Kafka to MongoDB succesfully现在,我可以成功地将数据从 Kafka 流式传输到 MongoDB

My fix is:我的解决方法是:

  1. move my kafka to C:\\kafka_2.12-2.2.0将我的 kafka 移动到C:\\kafka_2.12-2.2.0
  2. update plugin_path corresponding to new path更新与新路径对应的 plugin_path
  3. update config file connect-standalone.properties更新配置文件connect-standalone.properties

There is an official source and sink connector from MongoDB themselves. MongoDB 本身有一个官方的源和接收器连接器。 It is available on Confluent Hub: https://www.confluent.io/hub/mongodb/kafka-connect-mongodb它在 Confluent Hub 上可用: https : //www.confluent.io/hub/mongodb/kafka-connect-mongodb

If you don't want to use Confluent Platform you can deploy Apache Kafka yourself - it includes Kafka Connect already.如果您不想使用 Confluent Platform,您可以自己部署 Apache Kafka - 它已经包含 Kafka Connect。 Which plugins (connectors) you use with it is up to you.您使用哪些插件(连接器)取决于您。 In this case you would be using Kafka Connect (part of Apache Kafka) plus kafka-connect-mongodb (provided by MongoDB).在这种情况下,您将使用 Kafka Connect(Apache Kafka 的一部分)和 kafka-connect-mongodb(由 MongoDB 提供)。

Documentation on how to use it is here: https://docs.mongodb.com/kafka-connector/current/关于如何使用它的文档在这里: https : //docs.mongodb.com/kafka-connector/current/

Even though this question is a little old.尽管这个问题有点老了。 Here is how I connected kafka_2.12-2.6.0 to mongodb (version 4.4) on ubuntu system:以下是我在 ubuntu 系统上将 kafka_2.12-2.6.0连接到 mongodb(4.4 版)的方法:

a.一种。 Download mongodb connector '*-all.jar' from here .Mongodb-kafka connector with ' all ' at the end will contain all connector dependencies also.这里下载 mongodb 连接器 '*-all.jar' .Mongodb-kafka 连接器,最后带有 ' all ' 也将包含所有连接器依赖项。

b.Drop this jar file in your kafka's lib folder将此 jar 文件放到 kafka 的 lib 文件夹中

c. C。 Configure ' connect-standalone_bare.properties ' as:将“ connect-standalone_bare.properties ”配置为:

bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000

d. d. Configure ' MongoSinkConnector.properties ' as:将“ MongoSinkConnector.properties ”配置为:

name=mongo-sink
topics=test
connector.class=com.mongodb.kafka.connect.MongoSinkConnector
tasks.max=1
key.ignore=true
connection.uri=mongodb://localhost:27017
database=test_kafka
collection=transaction
max.num.retries=3
retries.defer.timeout=5000
type.name=kafka-connect
schemas.enable=false

Place both ' properties ' file here: $HOME/Documents/kafka/config将两个“属性”文件放在此处:$HOME/Documents/kafka/config

e. e. Start connector-process, as:启动连接器进程,如:

export folder_path="$HOME/Documents/kafka/config"
connect-standalone.sh  $folder_path/connect-standalone_bare.properties $folder_path/MongoSinkConnector.properties

e. e. In kafka, start zookeeper-server as also kafka-server.在 kafka 中,启动 zookeeper-server 和 kafka-server。 Create topic 'test'.创建主题“测试”。 In mongod server, create database 'test_kafka' and under it a collection, 'transaction'.在 mongod 服务器中,创建数据库“test_kafka”,并在其下创建一个集合“transaction”。

f. F。 Start kafka producer:启动 kafka 生产者:

kafka-console-producer.sh --broker-list localhost:9092  --topic test

And make an entry: {"abc" : "def" }并输入: {"abc" : "def" }

You should be able to see it in mongodb ( db.transaction.find() ).您应该能够在 mongodb ( db.transaction.find() ) 中看到它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM