[英]Single or multiple source kaka connector(s) for reading multiple collections in the mongoDB cluster
I want to know if it is recommended to create multiple Kafka connectors for streaming multiple collections data available in the same database or different databases within the same MongoDB cluster.我想知道是否建议创建多个 Kafka 连接器来流式传输同一数据库或同一 MongoDB 集群中的不同数据库中可用的多个 collections 数据。
I think there will be only one oplog per cluster.我认为每个集群只有一个 oplog。 So it is easy to read the data for multiple collections and this approach will put less load on the cluster.
因此很容易读取多个 collections 的数据,这种方法将减少集群的负载。 But I am not sure how easy it will be to put the data on different Kafka topics per collection.
但我不确定将数据放在每个集合的不同 Kafka 主题上会有多容易。 While in the second approach of creating multiple connectors.
而在创建多个连接器的第二种方法中。 I feel like it is going to put too much load on the server.
我觉得这会给服务器带来太多的负担。
Please suggest what is the recommended approach.请建议推荐的方法是什么。
You can listen to multiple change streams from multiple mongo collections, you just need to provide the suitable Regex for the collection names in pipeline
.您可以监听来自多个 mongo collections 的多个更改流,您只需为
pipeline
中的集合名称提供合适的正则表达式。 You can even exclude collection/collections by providing the Regex from where you don't want to listen to any change streams.您甚至可以通过提供您不想收听任何更改流的正则表达式来排除集合/集合。
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"
You can even exclude any given database using $nin
, which you don't want to listen for any change-stream.您甚至可以使用
$nin
排除任何给定的数据库,您不想监听任何更改流。
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/,\"$nin\":[/^any_database_name$/]}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"
N
number of Kafka connectors for each collection, rather I would recommend creating a single Kafka connector.N
个 Kafka 连接器将是一种开销,我建议创建单个 Kafka 连接器。 Make sure you provide fault tolerance using recommended configurations, just don't rely on a default configuration of connector. Here is the basic Kafka connector configuration.这是基本的 Kafka 连接器配置。
Mongo to Kafka source connector Mongo 到 Kafka 源连接器
{
"name": "mongo-to-kafka-connect",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"publish.full.document.only": "true",
"tasks.max": "3",
"key.converter.schemas.enable": "false",
"topic.creation.enable": "true",
"poll.await.time.ms": 1000,
"poll.max.batch.size": 100,
"topic.prefix": "any prefix for topic name",
"output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
"connection.uri": "mongodb://<username>:<password>@ip:27017,ip:27017,ip:27017,ip:27017/?authSource=admin&replicaSet=xyz&tls=true",
"value.converter.schemas.enable": "false",
"copy.existing": "true",
"topic.creation.default.replication.factor": 3,
"topic.creation.default.partitions": 3,
"topic.creation.compacted.cleanup.policy": "compact",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"mongo.errors.log.enable": "true",
"heartbeat.interval.ms": 10000,
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"
}
}
You can get more details from official docs.您可以从官方文档中获得更多详细信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.