Mongo Kafka 连接器集合侦听限制

Question

We have several collections in Mongo based on n tenants and want the kafka connector to only watch for specific collections.基于 n 个租户，我们在 Mongo 中有几个 collections，并希望 kafka 连接器只监视特定的 collections。

Below is my mongosource.properties file where I have added the pipeline filter to listen only to specific collections.It works下面是我的 mongosource.properties 文件，我在其中添加了管道过滤器以仅侦听特定的 collections。它可以工作

pipeline=[{$match:{“ns.coll”:{"$in":[“ecom-tesla-cms-instance”,“ca-tesla-cms-instance”,“ecom-tesla-cms-page”,“ca-tesla-cms-page”]}}}]

the collections will grow in the future to maybe 200 collections which have to be watched, wanted to know the below three things collections 将来可能会增长到 200 个 collections 必须注意，想知道以下三件事

is there some performance impact with one connector listening to huge number of collections?一个连接器监听大量 collections 是否会对性能产生影响？
is there any limit on the collections one connector can watch? collections 一个连接器可以看有没有限制？
what would be the best practice, to run one connector listening to 100 collections or 10 different connectors listening to 10 collections each?什么是最佳实践，运行一个连接器监听 100 个 collections 或 10 个不同的连接器每个监听 10 个 collections？

Answer 1

Best practice would say to run many connectors, where "many" depends on your ability to maintain the overhead of them all.最佳实践是运行许多连接器，其中“许多”取决于您维持所有连接器开销的能力。

Reason being - one connector creates a single point of failure (per task, but only one task should be assigned to any collection at a time, to prevent duplicates).原因是 - 一个连接器会产生单点故障（每个任务，但一次只能将一个任务分配给任何集合，以防止重复）。 If the Connect task fails with a non-retryable error, then that will halt the connector's tasks completely, and stop reading from all collections assigned to that connector.如果 Connect 任务因不可重试错误而失败，那么这将完全停止连接器的任务，并停止从分配给该连接器的所有集合中读取。

You could also try Debezium, which might have less resource usage than the Mongo Source Connector since it acts as a replica rather than querying the collection at an interval.您还可以尝试 Debezium，它的资源使用量可能比 Mongo Source Connector 少，因为它充当副本而不是每隔一段时间查询集合。

Answer 2

You can listen to multiple change streams from multiple mongo collections, you just need to provide the suitable Regex for the collection names in pipeline .您可以监听来自多个 mongo collections 的多个更改流，您只需为pipeline中的集合名称提供合适的正则表达式。 You can even exclude collection/collections by providing the Regex from where you don't want to listen to any change streams.您甚至可以通过提供您不想收听任何更改流的正则表达式来排除集合/集合。

"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"

You can even exclude any given database using $nin , which you don't want to listen for any change-stream.您甚至可以使用$nin排除任何给定的数据库，您不想监听任何更改流。

"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/,\"$nin\":[/^any_database_name$/]}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"

Coming to your questions:来回答你的问题：

Is there some performance impact with one connector listening to huge number of collections?一个连接器监听大量 collections 是否会对性能产生影响？
- To my knowledge I don't think so, since it is not mentioned anywhere in the docs.据我所知，我不这么认为，因为文档中的任何地方都没有提到它。 You can listen to multiple mongo collections using a single connector.您可以使用单个连接器收听多个 mongo collections。
Is there any limit on the collections one connector can watch? collections 一个接头能看有没有限制？
- Again to my knowledge there is no limit mentioned in docs.据我所知，文档中没有提到任何限制。
What would be the best practice, to run one connector listening to 100 collections or 10 different connectors listening to 10 collections each?什么是最佳实践，运行一个连接器监听 100 个 collections 或 10 个不同的连接器每个监听 10 个 collections？
- From my point of view it will be an overhead to create an N number of Kafka connectors for each collection, make sure you provide fault tolerance using recommended configurations, just don't rely on a default configuration of connector.从我的角度来看，为每个集合创建N个 Kafka 连接器将是一种开销，请确保使用推荐的配置提供容错，只是不要依赖连接器的默认配置。

Here is the basic Kafka connector configuration.这是基本的 Kafka 连接器配置。

Mongo to Kafka source connector Mongo 到 Kafka 源连接器

{
  "name": "mongo-to-kafka-connect",
  "config": {
    "connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
    "publish.full.document.only": "true",
    "tasks.max": "3",
    "key.converter.schemas.enable": "false",
    "topic.creation.enable": "true",
    "poll.await.time.ms": 1000,
    "poll.max.batch.size": 100,
    "topic.prefix": "any prefix for topic name",
    "output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
    "connection.uri": "mongodb://<username>:<password>@ip:27017,ip:27017,ip:27017,ip:27017/?authSource=admin&replicaSet=xyz&tls=true",
    "value.converter.schemas.enable": "false",
    "copy.existing": "true",
    "topic.creation.default.replication.factor": 3,
    "topic.creation.default.partitions": 3,
    "topic.creation.compacted.cleanup.policy": "compact",
    "value.converter": "org.apache.kafka.connect.storage.StringConverter",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "mongo.errors.log.enable": "true",
    "heartbeat.interval.ms": 10000,
    "pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"
  }
}

You can get more details from official docs.您可以从官方文档中获得更多详细信息。

Mongo docs: https://www.mongodb.com/docs/kafka-connector/current/source-connector/ Mongo 文档： https://www.mongodb.com/docs/kafka-connector/current/source-connector/
Confluent docs: https://docs.confluent.io/platform/current/connect/index.html汇合文档： https://docs.confluent.io/platform/current/connect/index.html
Regex: https://www.mongodb.com/docs/manual/reference/operator/query/regex/#mongodb-query-op.-regex正则表达式： https://www.mongodb.com/docs/manual/reference/operator/query/regex/#mongodb-query-op.-regex

Mongo Kafka 连接器集合侦听限制

问题描述

2 个解决方案

解决方案1
0 2022-05-12 17:17:10

解决方案2
0 2022-08-11 15:08:11

Coming to your questions:来回答你的问题：

Mongo Kafka 连接器集合侦听限制

问题描述

2 个解决方案

解决方案1 0 2022-05-12 17:17:10

解决方案2 0 2022-08-11 15:08:11

Coming to your questions:来回答你的问题：

解决方案1
0 2022-05-12 17:17:10

解决方案2
0 2022-08-11 15:08:11