[英]Mongo Kafka Connector Collection Listen Limitations
We have several collections in Mongo based on n tenants and want the kafka connector to only watch for specific collections.基于 n 个租户,我们在 Mongo 中有几个 collections,并希望 kafka 连接器只监视特定的 collections。
Below is my mongosource.properties file where I have added the pipeline filter to listen only to specific collections.It works下面是我的 mongosource.properties 文件,我在其中添加了管道过滤器以仅侦听特定的 collections。它可以工作
pipeline=[{$match:{“ns.coll”:{"$in":[“ecom-tesla-cms-instance”,“ca-tesla-cms-instance”,“ecom-tesla-cms-page”,“ca-tesla-cms-page”]}}}]
the collections will grow in the future to maybe 200 collections which have to be watched, wanted to know the below three things collections 将来可能会增长到 200 个 collections 必须注意,想知道以下三件事
Best practice would say to run many connectors, where "many" depends on your ability to maintain the overhead of them all.最佳实践是运行许多连接器,其中“许多”取决于您维持所有连接器开销的能力。
Reason being - one connector creates a single point of failure (per task, but only one task should be assigned to any collection at a time, to prevent duplicates).原因是 - 一个连接器会产生单点故障(每个任务,但一次只能将一个任务分配给任何集合,以防止重复)。 If the Connect task fails with a non-retryable error, then that will halt the connector's tasks completely, and stop reading from all collections assigned to that connector.
如果 Connect 任务因不可重试错误而失败,那么这将完全停止连接器的任务,并停止从分配给该连接器的所有集合中读取。
You could also try Debezium, which might have less resource usage than the Mongo Source Connector since it acts as a replica rather than querying the collection at an interval.您还可以尝试 Debezium,它的资源使用量可能比 Mongo Source Connector 少,因为它充当副本而不是每隔一段时间查询集合。
You can listen to multiple change streams from multiple mongo collections, you just need to provide the suitable Regex for the collection names in pipeline
.您可以监听来自多个 mongo collections 的多个更改流,您只需为
pipeline
中的集合名称提供合适的正则表达式。 You can even exclude collection/collections by providing the Regex from where you don't want to listen to any change streams.您甚至可以通过提供您不想收听任何更改流的正则表达式来排除集合/集合。
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"
You can even exclude any given database using $nin
, which you don't want to listen for any change-stream.您甚至可以使用
$nin
排除任何给定的数据库,您不想监听任何更改流。
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/,\"$nin\":[/^any_database_name$/]}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"
Is there some performance impact with one connector listening to huge number of collections?一个连接器监听大量 collections 是否会对性能产生影响?
Is there any limit on the collections one connector can watch? collections 一个接头能看有没有限制?
What would be the best practice, to run one connector listening to 100 collections or 10 different connectors listening to 10 collections each?什么是最佳实践,运行一个连接器监听 100 个 collections 或 10 个不同的连接器每个监听 10 个 collections?
N
number of Kafka connectors for each collection, make sure you provide fault tolerance using recommended configurations, just don't rely on a default configuration of connector.N
个 Kafka 连接器将是一种开销,请确保使用推荐的配置提供容错,只是不要依赖连接器的默认配置。 Here is the basic Kafka connector configuration.这是基本的 Kafka 连接器配置。
Mongo to Kafka source connector Mongo 到 Kafka 源连接器
{
"name": "mongo-to-kafka-connect",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"publish.full.document.only": "true",
"tasks.max": "3",
"key.converter.schemas.enable": "false",
"topic.creation.enable": "true",
"poll.await.time.ms": 1000,
"poll.max.batch.size": 100,
"topic.prefix": "any prefix for topic name",
"output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
"connection.uri": "mongodb://<username>:<password>@ip:27017,ip:27017,ip:27017,ip:27017/?authSource=admin&replicaSet=xyz&tls=true",
"value.converter.schemas.enable": "false",
"copy.existing": "true",
"topic.creation.default.replication.factor": 3,
"topic.creation.default.partitions": 3,
"topic.creation.compacted.cleanup.policy": "compact",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"mongo.errors.log.enable": "true",
"heartbeat.interval.ms": 10000,
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"
}
}
You can get more details from official docs.您可以从官方文档中获得更多详细信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.