[英]Implement dockerized kafka sink connector to mongo
I am trying to implement kafka connection to mongodb and mysql using docker. 我正在尝试使用docker实现与mongodb和mysql的kafka连接。
What I want is the following figure: 我想要的是下图:
Kafka Connect MongoDB: Kafka Connect MongoDB:
I have seen the docker-compose of official mongodb repository . 我已经看到了官方mongodb仓库的docker-compose。 It has two problems:
它有两个问题:
It is too complicated for my purpose. 就我的目的而言,它太复杂了。 Because it has run multiple containers of mongodb and also used many images that consume so much resources.
因为它运行了多个mongodb容器,并且还使用了许多消耗大量资源的映像。
It has some issues that isn't solved which end in malfunctioning of kafka to mongodb connection. 它有一些无法解决的问题,最终导致kafka与mongodb的连接出现故障。 Here you can see my issue.
在这里您可以看到我的问题。
What I have implemented in docker-compose.yml using debezium for connection is the following: 我在docker-compose.yml中使用debezium进行连接的实现如下:
version: '3.2'
services:
kafka:
image: wurstmeister/kafka:latest
ports:
- target: 9094
published: 9094
protocol: tcp
mode: host
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: INSIDE://:9092
KAFKA_LISTENERS: INSIDE://:9092,OUTSIDE://:9094
KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
KAFKA_LOG_DIRS: /kafka/logs
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- kafka:/kafka
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
volumes:
- zookeeper:/opt/zookeeper-3.4.13
mongo:
image: mongo
container_name: mongo
ports:
- 27017:27017
connect:
image: debezium/connect
container_name: connect
ports:
- 8083:8083
environment:
- BOOTSTRAP_SERVERS=kafka:9092
- GROUP_ID=1
- CONFIG_STORAGE_TOPIC=my_connect_configs
- OFFSET_STORAGE_TOPIC=my_connect_offsets
volumes:
kafka:
zookeeper:
As @cricket_007 says, I should not use debezium
for my purpose. 正如@ cricket_007所说,我不应出于自己的目的使用
debezium
。 So I have used the confluentinc/kafka-connect-datagen
image. 因此,我使用了
confluentinc/kafka-connect-datagen
映像。 Here I have added the following to the docker-compose.yml file instead of debezium
: 在这里,我将以下内容添加到docker-compose.yml文件中,而不是
debezium
:
connect:
image: confluentinc/kafka-connect-datagen
build:
context: .
dockerfile: Dockerfile
hostname: connect
container_name: connect
depends_on:
- zookeeper
ports:
- 8083:8083
environment:
CONNECT_BOOTSTRAP_SERVERS: 'kafka:9092'
CONNECT_REST_ADVERTISED_HOST_NAME: connect
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: compose-connect-group
CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000
CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_KEY_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_INTERNAL_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_LOG4J_ROOT_LOGLEVEL: "INFO"
CONNECT_PLUGIN_PATH: /usr/share/confluent-hub-components
CONNECT_ZOOKEEPER_CONNECT: 'zookeeper:2181'
# Assumes image is based on confluentinc/kafka-connect-datagen:latest which is pulling 5.2.2 Connect image
CLASSPATH: /usr/share/java/monitoring-interceptors/monitoring-interceptors-5.2.2.jar
CONNECT_PRODUCER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor"
CONNECT_CONSUMER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor"
command: "bash -c 'if [ ! -d /usr/share/confluent-hub-components/confluentinc-kafka-connect-datagen ]; then echo \"WARNING: Did not find directory for kafka-connect-datagen (did you remember to run: docker-compose up -d --build ?)\"; fi ; /etc/confluent/docker/run'"
volumes:
- ../build/confluent/kafka-connect-mongodb:/usr/share/confluent-hub-components/kafka-connect-mongodb
Dockerfile: Dockerfile:
FROM confluentinc/cp-kafka-connect
ENV CONNECT_PLUGIN_PATH="/usr/share/java,/usr/share/confluent-hub-components"
RUN confluent-hub install --no-prompt confluentinc/kafka-connect-datagen
Problem: 问题:
The Kafka-connect-datagen
image generates fake data and as it mentioned in the repository , it's not suitable for production. Kafka-connect-datagen
映像生成伪造数据,并且如存储库中所述 ,它不适合生产。 What I want is just connect Kafka to mongodb, neither less nor more than it. 我想要的只是将Kafka连接到mongodb,既不要少也不要多。 Explicitly, how can I send data from kafka with
curl
and save them in a mongodb collection? 明确地,我该如何使用
curl
从kafka发送数据并将其保存在mongodb集合中?
I face with the CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL is required.
我面对的是
CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL is required.
error. 错误。 As @cricket_007 said
schema-registry
is optional. 正如@ cricket_007所说,
schema-registry
是可选的。 So how can I get rid of that image? 那么我该如何摆脱这种印象呢?
At the last step I tried to run the repository's docker-compose file as explained in README.md, unfortunately I faced with another error: 在最后一步,我尝试按照README.md中的说明运行存储库的docker-compose文件,不幸的是,我遇到了另一个错误:
WARNING: Could not reach configured kafka system on http://localhost:8083 Note: This script requires curl.
警告:无法访问http:// localhost:8083上已配置的kafka系统注意:此脚本需要curl。
Whenever I didn't make any change to the configuration, I face with another error: 每当我不对配置进行任何更改时,我都会遇到另一个错误:
Kafka Connectors:
{"error_code":409,"message":"Cannot complete request momentarily due to stale configuration (typically caused by a concurrent config change)"}
Please help me to find answers for my questions. 请帮助我找到问题的答案。
My output: 我的输出:
Building the MongoDB Kafka Connector
> Task :shadowJar
FatJar: /home/mostafa/Documents/Docker/kafka-mongo/build/libs/kafka-mongo-0.3-SNAPSHOT-all.jar (2.108904 MB)
Deprecated Gradle features were used in this build, making it incompatible with Gradle 6.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/5.2/userguide/command_line_interface.html#sec:command_line_warnings
BUILD SUCCESSFUL in 4h 26m 25s
7 actionable tasks: 7 executed
Unzipping the confluent archive plugin....
Archive: ./build/confluent/mongodb-kafka-connect-mongodb-0.3-SNAPSHOT.zip
creating: ./build/confluent/mongodb-kafka-connect-mongodb-0.3-SNAPSHOT/
creating: ./build/confluent/mongodb-kafka-connect-mongodb-0.3-SNAPSHOT/etc/
inflating: ./build/confluent/mongodb-kafka-connect-mongodb-0.3-SNAPSHOT/etc/MongoSinkConnector.properties
inflating: ./build/confluent/mongodb-kafka-connect-mongodb-0.3-SNAPSHOT/etc/MongoSourceConnector.properties
creating: ./build/confluent/mongodb-kafka-connect-mongodb-0.3-SNAPSHOT/lib/
inflating: ./build/confluent/mongodb-kafka-connect-mongodb-0.3-SNAPSHOT/lib/kafka-mongo-0.3-SNAPSHOT-all.jar
inflating: ./build/confluent/mongodb-kafka-connect-mongodb-0.3-SNAPSHOT/manifest.json
creating: ./build/confluent/mongodb-kafka-connect-mongodb-0.3-SNAPSHOT/assets/
inflating: ./build/confluent/mongodb-kafka-connect-mongodb-0.3-SNAPSHOT/assets/mongodb-leaf.png
inflating: ./build/confluent/mongodb-kafka-connect-mongodb-0.3-SNAPSHOT/assets/mongodb-logo.png
creating: ./build/confluent/mongodb-kafka-connect-mongodb-0.3-SNAPSHOT/doc/
inflating: ./build/confluent/mongodb-kafka-connect-mongodb-0.3-SNAPSHOT/doc/README.md
inflating: ./build/confluent/mongodb-kafka-connect-mongodb-0.3-SNAPSHOT/doc/LICENSE.txt
Starting docker .
Creating volume "docker_rs2" with default driver
Creating volume "docker_rs3" with default driver
Building connect
Step 1/3 : FROM confluentinc/cp-kafka-connect:5.2.2
---> 32bb41f78617
Step 2/3 : ENV CONNECT_PLUGIN_PATH="/usr/share/confluent-hub-components"
---> Using cache
---> 9e4fd4f10a38
Step 3/3 : RUN confluent-hub install --no-prompt confluentinc/kafka-connect-datagen:latest
---> Using cache
---> 5f879008bb73
Successfully built 5f879008bb73
Successfully tagged confluentinc/kafka-connect-datagen:latest
Recreating mongo1 ...
Recreating mongo1 ... done
Creating mongo3 ... done
Starting broker ... done
Creating mongo2 ... done
Starting schema-registry ... done
Starting connect ... done
Creating rest-proxy ... done
Creating ksql-server ... done
Creating docker_kafka-topics-ui_1 ... done
Creating control-center ... done
Creating ksql-cli ... done
Waiting for the systems to be ready.............
WARNING: Could not reach configured kafka system on http://localhost:8082
Note: This script requires curl.
SHUTTING DOWN
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 68 100 68 0 0 23 0 0:00:02 0:00:02 --:--:-- 23
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 61 100 61 0 0 4066 0 --:--:-- --:--:-- --:--:-- 4066
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 63 100 63 0 0 9000 0 --:--:-- --:--:-- --:--:-- 9000
MongoDB shell version v4.0.12
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("80ebb904-f81a-4230-b63b-4e62f65fbeb7") }
MongoDB server version: 4.0.12
{
"ok" : 1,
"operationTime" : Timestamp(1567235833, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1567235833, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
Stopping ksql-cli ... done
Stopping control-center ... done
Stopping docker_kafka-topics-ui_1 ... done
Stopping ksql-server ... done
Stopping rest-proxy ... done
Stopping mongo1 ... done
Stopping mongo2 ... done
Stopping mongo3 ... done
Stopping connect ... done
Stopping broker ... done
Stopping zookeeper ... done
Removing ksql-cli ...
Removing control-center ... done
Removing docker_kafka-topics-ui_1 ... done
Removing ksql-server ... done
Removing rest-proxy ... done
Removing mongo1 ... done
Removing mongo2 ... done
Removing mongo3 ... done
Removing connect ... done
Removing schema-registry ... done
Removing broker ... done
Removing zookeeper ... done
Removing network docker_default
Removing network docker_localnet
WARNING: Could not reach configured kafka system on http://localhost:8082
Note: This script requires curl.
I created the following docker-compose file (view all files in GitHub ): 我创建了以下docker-compose文件(查看GitHub中的所有文件):
version: '3.6'
services:
zookeeper:
image: confluentinc/cp-zookeeper:5.1.2
hostname: zookeeper
container_name: zookeeper
ports:
- "2181:2181"
networks:
- localnet
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
broker:
image: confluentinc/cp-enterprise-kafka:5.1.2
hostname: broker
container_name: broker
depends_on:
- zookeeper
ports:
- "29092:29092"
- "9092:9092"
networks:
- localnet
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_METRIC_REPORTERS: io.confluent.metrics.reporter.ConfluentMetricsReporter
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
CONFLUENT_METRICS_REPORTER_BOOTSTRAP_SERVERS: broker:29092
CONFLUENT_METRICS_REPORTER_ZOOKEEPER_CONNECT: zookeeper:2181
CONFLUENT_METRICS_REPORTER_TOPIC_REPLICAS: 1
CONFLUENT_METRICS_ENABLE: 'true'
CONFLUENT_SUPPORT_CUSTOMER_ID: 'anonymous'
connect:
image: confluentinc/cp-kafka-connect:5.1.2
build:
context: .
dockerfile: Dockerfile
hostname: connect
container_name: connect
depends_on:
- zookeeper
- broker
ports:
- "8083:8083"
networks:
- localnet
environment:
CONNECT_BOOTSTRAP_SERVERS: 'broker:29092'
CONNECT_REST_ADVERTISED_HOST_NAME: connect
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: compose-connect-group
CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000
CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_INTERNAL_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_LOG4J_ROOT_LOGLEVEL: "INFO"
CONNECT_LOG4J_LOGGERS: "org.apache.kafka.connect.runtime.rest=WARN,org.reflections=ERROR,com.mongodb.kafka=DEBUG"
CONNECT_PLUGIN_PATH: /usr/share/confluent-hub-components
CONNECT_ZOOKEEPER_CONNECT: 'zookeeper:2181'
# Assumes image is based on confluentinc/kafka-connect-datagen:latest which is pulling 5.2.2 Connect image
CLASSPATH: /usr/share/java/monitoring-interceptors/monitoring-interceptors-5.2.2.jar
CONNECT_PRODUCER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor"
CONNECT_CONSUMER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor"
command: "bash -c 'if [ ! -d /usr/share/confluent-hub-components/confluentinc-kafka-connect-datagen ]; then echo \"WARNING: Did not find directory for kafka-connect-datagen (did you remember to run: docker-compose up -d --build ?)\"; fi ; /etc/confluent/docker/run'"
volumes:
- ./kafka-connect-mongodb:/usr/share/confluent-hub-components/kafka-connect-mongodb
# MongoDB Replica Set
mongo1:
image: "mongo:4.0-xenial"
container_name: mongo1
command: --replSet rs0 --smallfiles --oplogSize 128
volumes:
- rs1:/data/db
networks:
- localnet
ports:
- "27017:27017"
restart: always
mongo2:
image: "mongo:4.0-xenial"
container_name: mongo2
command: --replSet rs0 --smallfiles --oplogSize 128
volumes:
- rs2:/data/db
networks:
- localnet
ports:
- "27018:27017"
restart: always
mongo3:
image: "mongo:4.0-xenial"
container_name: mongo3
command: --replSet rs0 --smallfiles --oplogSize 128
volumes:
- rs3:/data/db
networks:
- localnet
ports:
- "27019:27017"
restart: always
networks:
localnet:
attachable: true
volumes:
rs1:
rs2:
rs3:
After executing docker-compose up
you have to configure your MongoDB cluster: 在执行
docker-compose up
您必须配置MongoDB集群:
docker-compose exec mongo1 /usr/bin/mongo --eval '''if (rs.status()["ok"] == 0) {
rsconf = {
_id : "rs0",
members: [
{ _id : 0, host : "mongo1:27017", priority: 1.0 },
{ _id : 1, host : "mongo2:27017", priority: 0.5 },
{ _id : 2, host : "mongo3:27017", priority: 0.5 }
]
};
rs.initiate(rsconf);
}
rs.conf();'''
Make sure that your plugin is installed: 确保您的插件已安装:
curl localhost:8083/connector-plugins | jq
[
{
"class": "com.mongodb.kafka.connect.MongoSinkConnector",
"type": "sink",
"version": "0.2"
},
{
"class": "com.mongodb.kafka.connect.MongoSourceConnector",
"type": "source",
"version": "0.2"
},
{
"class": "io.confluent.connect.gcs.GcsSinkConnector",
"type": "sink",
"version": "5.0.1"
},
{
"class": "io.confluent.connect.storage.tools.SchemaSourceConnector",
"type": "source",
"version": "2.1.1-cp1"
},
{
"class": "org.apache.kafka.connect.file.FileStreamSinkConnector",
"type": "sink",
"version": "2.1.1-cp1"
},
{
"class": "org.apache.kafka.connect.file.FileStreamSourceConnector",
"type": "source",
"version": "2.1.1-cp1"
}
]
As you can see above the MongoDB connectors plugins are available for use. 如上所示,可以使用MongoDB连接器插件。 Assuming you have a database named
mydb
and a collection named products
I create JSON file named sink-connector.json: 假设您有一个名为
mydb
的数据库和一个名为products
的集合,我将创建一个名为sink-connector.json的JSON文件:
{
"name": "mongo-sink",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max": "1",
"topics": "product.events",
"connection.uri": "mongodb://mongo1:27017,mongo2:27017,mongo3:27017",
"database": "mydb",
"collection": "products",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false"
}
}
Now create connector using connect RESTful API: 现在使用connect RESTful API创建连接器:
curl -X POST -H "Content-Type: application/json" -d @sink-connector.json http://localhost:8083/connectors | jq
You can view the status of your connector: 您可以查看连接器的状态:
curl http://localhost:8083/connectors/mongo-sink/status | jq
{
"name": "mongo-sink",
"connector": {
"state": "RUNNING",
"worker_id": "connect:8083"
},
"tasks": [
{
"id": 0,
"state": "RUNNING",
"worker_id": "connect:8083"
}
],
"type": "sink"
}
Now let's create a Kafka topic. 现在让我们创建一个Kafka主题。 First, we must connect to Kafka container:
首先,我们必须连接到Kafka容器:
docker-compose exec broker bash
Then create the topic: 然后创建主题:
kafka-topics --zookeeper zookeeper:2181 --create --topic product.events --partitions 1 --replication-factor 1
Now produce products into the topic: 现在将产品制作成主题:
kafka-console-producer --broker-list localhost:9092 --topic product.events
>{"Name": "Hat", "Price": 25}
>{"Name": "Shoe", "Price": 15}
You can view the result in the image: 您可以在图像中查看结果:
Hope this will help you. 希望这会帮助你。
Debezium reads data from Mongo. Debezium 从 Mongo读取数据。 If you want a sink connector, you'll need to use that official one you found, but there are also others available on Github, for example.
如果您需要接收器连接器,则需要使用找到的官方连接器,例如,Github上还有其他可用的连接器。
Kafka Connect uses a REST API, so you'll also need to create a JSON payload with all the connection and topic details. Kafka Connect使用REST API,因此您还需要创建带有所有连接和主题详细信息的JSON有效负载。 There are guides in that repo you found
您在该存储库中找到了指南
it has run multiple containers of mongodb and also used many images that consume so much resources.
它运行了多个mongodb容器,还使用了许多消耗大量资源的映像。
You do not need KSQL, Control Center, REST proxy, Topic UI, etc. Only Kafka, Zookeeper, Connect, Mongo, and optionally the Schema Registry. 您不需要KSQL,控制中心,REST代理,主题UI等。仅需要Kafka,Zookeeper,Connect,Mongo和架构注册表即可。 So just remove the other containers in the compose file.
因此,只需删除撰写文件中的其他容器。 You also probably don't need multiple Mongo containers, but then you'll need to reconfigure the environment variables to adjust to only one instance
您可能还不需要多个Mongo容器,但是您将需要重新配置环境变量以仅调整一个实例
how can I send data from kafka with curl and save them in a mongodb collection?
如何使用curl从kafka发送数据并将其保存在mongodb集合中?
If you did want to use curl
, then you will need to start the REST Proxy container. 如果确实要使用
curl
,那么将需要启动REST代理容器。 That would get you past the Could not reach configured kafka system on http://localhost:8082
error message. 这将使您摆脱
Could not reach configured kafka system on http://localhost:8082
错误消息Could not reach configured kafka system on http://localhost:8082
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.