简体   繁体   English

Mongo Kafka 连接器集合侦听限制

[英]Mongo Kafka Connector Collection Listen Limitations

We have several collections in Mongo based on n tenants and want the kafka connector to only watch for specific collections.基于 n 个租户,我们在 Mongo 中有几个 collections,并希望 kafka 连接器只监视特定的 collections。

Below is my mongosource.properties file where I have added the pipeline filter to listen only to specific collections.It works下面是我的 mongosource.properties 文件,我在其中添加了管道过滤器以仅侦听特定的 collections。它可以工作

pipeline=[{$match:{“ns.coll”:{"$in":[“ecom-tesla-cms-instance”,“ca-tesla-cms-instance”,“ecom-tesla-cms-page”,“ca-tesla-cms-page”]}}}]

the collections will grow in the future to maybe 200 collections which have to be watched, wanted to know the below three things collections 将来可能会增长到 200 个 collections 必须注意,想知道以下三件事

  1. is there some performance impact with one connector listening to huge number of collections?一个连接器监听大量 collections 是否会对性能产生影响?
  2. is there any limit on the collections one connector can watch? collections 一个连接器可以看有没有限制?
  3. what would be the best practice, to run one connector listening to 100 collections or 10 different connectors listening to 10 collections each?什么是最佳实践,运行一个连接器监听 100 个 collections 或 10 个不同的连接器每个监听 10 个 collections?

Best practice would say to run many connectors, where "many" depends on your ability to maintain the overhead of them all.最佳实践是运行许多连接器,其中“许多”取决于您维持所有连接器开销的能力。

Reason being - one connector creates a single point of failure (per task, but only one task should be assigned to any collection at a time, to prevent duplicates).原因是 - 一个连接器会产生单点故障(每个任务,但一次只能将一个任务分配给任何集合,以防止重复)。 If the Connect task fails with a non-retryable error, then that will halt the connector's tasks completely, and stop reading from all collections assigned to that connector.如果 Connect 任务因不可重试错误而失败,那么这将完全停止连接器的任务,并停止从分配给该连接器的所有集合中读取。

You could also try Debezium, which might have less resource usage than the Mongo Source Connector since it acts as a replica rather than querying the collection at an interval.您还可以尝试 Debezium,它的资源使用量可能比 Mongo Source Connector 少,因为它充当副本而不是每隔一段时间查询集合。

You can listen to multiple change streams from multiple mongo collections, you just need to provide the suitable Regex for the collection names in pipeline .您可以监听来自多个 mongo collections 的多个更改流,您只需为pipeline中的集合名称提供合适的正则表达式。 You can even exclude collection/collections by providing the Regex from where you don't want to listen to any change streams.您甚至可以通过提供您不想收听任何更改流的正则表达式来排除集合/集合。

"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"  

You can even exclude any given database using $nin , which you don't want to listen for any change-stream.您甚至可以使用$nin排除任何给定的数据库,您不想监听任何更改流。

"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/,\"$nin\":[/^any_database_name$/]}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"

Coming to your questions:来回答你的问题:

  1. Is there some performance impact with one connector listening to huge number of collections?一个连接器监听大量 collections 是否会对性能产生影响?

    • To my knowledge I don't think so, since it is not mentioned anywhere in the docs.据我所知,我不这么认为,因为文档中的任何地方都没有提到它。 You can listen to multiple mongo collections using a single connector.您可以使用单个连接器收听多个 mongo collections。
  2. Is there any limit on the collections one connector can watch? collections 一个接头能看有没有限制?

    • Again to my knowledge there is no limit mentioned in docs.据我所知,文档中没有提到任何限制。
  3. What would be the best practice, to run one connector listening to 100 collections or 10 different connectors listening to 10 collections each?什么是最佳实践,运行一个连接器监听 100 个 collections 或 10 个不同的连接器每个监听 10 个 collections?

    • From my point of view it will be an overhead to create an N number of Kafka connectors for each collection, make sure you provide fault tolerance using recommended configurations, just don't rely on a default configuration of connector.从我的角度来看,为每个集合创建N个 Kafka 连接器将是一种开销,请确保使用推荐的配置提供容错,只是不要依赖连接器的默认配置。

Here is the basic Kafka connector configuration.这是基本的 Kafka 连接器配置。

Mongo to Kafka source connector Mongo 到 Kafka 源连接器

{
  "name": "mongo-to-kafka-connect",
  "config": {
    "connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
    "publish.full.document.only": "true",
    "tasks.max": "3",
    "key.converter.schemas.enable": "false",
    "topic.creation.enable": "true",
    "poll.await.time.ms": 1000,
    "poll.max.batch.size": 100,
    "topic.prefix": "any prefix for topic name",
    "output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
    "connection.uri": "mongodb://<username>:<password>@ip:27017,ip:27017,ip:27017,ip:27017/?authSource=admin&replicaSet=xyz&tls=true",
    "value.converter.schemas.enable": "false",
    "copy.existing": "true",
    "topic.creation.default.replication.factor": 3,
    "topic.creation.default.partitions": 3,
    "topic.creation.compacted.cleanup.policy": "compact",
    "value.converter": "org.apache.kafka.connect.storage.StringConverter",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "mongo.errors.log.enable": "true",
    "heartbeat.interval.ms": 10000,
    "pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"
  }
}

You can get more details from official docs.您可以从官方文档中获得更多详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Mongo kafka 连接器,解串器问题 - Mongo kafka connector, deserializer question 将dockerized kafka接收器连接器实现到mongo - Implement dockerized kafka sink connector to mongo kafka mongo db 源连接器与在 kube.netes 上运行的 mongo db - kafka mongo db source connector with mongo db running on kubernetes MongoDB Kafka 连接器未生成具有 Mongo 文档 ID 的消息密钥 - MongoDB Kafka Connector not generating the message key with the Mongo document id 在MongoSink - alpakka mongo连接器响应后,向kafka消费者致敬 - Commit to kafka consumer after response from MongoSink - alpakka mongo connector mongo db sink 连接器,kafka 消息密钥到 mongodb 文档字段 - mongo db sink connector, kafka message key to mongodb document field 如何使用mongo连接器导入指定的收集数据 - how to import specified collection data using mongo connector 将mongo-connector限制为特定集合以进行Solr索引 - Limit mongo-connector to a specific collection for Solr indexing 通过 kafka mongo sink 连接器在 mongo 中的内联 json 数组对象中附加字段值 - append field value in inline json array object in mongo through kafka mongo sink connector 如何运行 mongo-kafka 连接器作为 kafka 的源并将其与 logstash 输入集成以使用 elasticsearch 作为接收器? - How to run the mongo-kafka connector as a source for kafka and integrate that with logstash input to use elasticsearch as a sink?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM