简体   繁体   English

用于读取 mongoDB 集群中的多个 collections 的单个或多个源 kaka 连接器

[英]Single or multiple source kaka connector(s) for reading multiple collections in the mongoDB cluster

I want to know if it is recommended to create multiple Kafka connectors for streaming multiple collections data available in the same database or different databases within the same MongoDB cluster.我想知道是否建议创建多个 Kafka 连接器来流式传输同一数据库或同一 MongoDB 集群中的不同数据库中可用的多个 collections 数据。

I think there will be only one oplog per cluster.我认为每个集群只有一个 oplog。 So it is easy to read the data for multiple collections and this approach will put less load on the cluster.因此很容易读取多个 collections 的数据,这种方法将减少集群的负载。 But I am not sure how easy it will be to put the data on different Kafka topics per collection.但我不确定将数据放在每个集合的不同 Kafka 主题上会有多容易。 While in the second approach of creating multiple connectors.而在创建多个连接器的第二种方法中。 I feel like it is going to put too much load on the server.我觉得这会给服务器带来太多的负担。

Please suggest what is the recommended approach.请建议推荐的方法是什么。

You can listen to multiple change streams from multiple mongo collections, you just need to provide the suitable Regex for the collection names in pipeline .您可以监听来自多个 mongo collections 的多个更改流,您只需为pipeline中的集合名称提供合适的正则表达式。 You can even exclude collection/collections by providing the Regex from where you don't want to listen to any change streams.您甚至可以通过提供您不想收听任何更改流的正则表达式来排除集合/集合。

"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"  

You can even exclude any given database using $nin , which you don't want to listen for any change-stream.您甚至可以使用$nin排除任何给定的数据库,您不想监听任何更改流。

"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/,\"$nin\":[/^any_database_name$/]}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"

Coming to your questions:来回答你的问题:

  • From my point of view it will be an overhead to create an N number of Kafka connectors for each collection, rather I would recommend creating a single Kafka connector.从我的角度来看,为每个集合创建N个 Kafka 连接器将是一种开销,我建议创建单个 Kafka 连接器。 Make sure you provide fault tolerance using recommended configurations, just don't rely on a default configuration of connector.确保您使用推荐的配置提供容错,只是不要依赖连接器的默认配置。

Here is the basic Kafka connector configuration.这是基本的 Kafka 连接器配置。

Mongo to Kafka source connector Mongo 到 Kafka 源连接器

{
  "name": "mongo-to-kafka-connect",
  "config": {
    "connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
    "publish.full.document.only": "true",
    "tasks.max": "3",
    "key.converter.schemas.enable": "false",
    "topic.creation.enable": "true",
    "poll.await.time.ms": 1000,
    "poll.max.batch.size": 100,
    "topic.prefix": "any prefix for topic name",
    "output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
    "connection.uri": "mongodb://<username>:<password>@ip:27017,ip:27017,ip:27017,ip:27017/?authSource=admin&replicaSet=xyz&tls=true",
    "value.converter.schemas.enable": "false",
    "copy.existing": "true",
    "topic.creation.default.replication.factor": 3,
    "topic.creation.default.partitions": 3,
    "topic.creation.compacted.cleanup.policy": "compact",
    "value.converter": "org.apache.kafka.connect.storage.StringConverter",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "mongo.errors.log.enable": "true",
    "heartbeat.interval.ms": 10000,
    "pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"
  }
}

You can get more details from official docs.您可以从官方文档中获得更多详细信息。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Mongodb Kafka Connector怎么看多个collections - Mongodb Kafka Connector how to watch multiple collections 将多个 collections 与 MongoDB Kafka 连接器一起使用 - Use multiple collections with MongoDB Kafka Connector 我可以在单个 Kafka-Connector S3 接收器连接器中映射具有多个主题的多个存储桶吗? - Can I map multiple buckets with multiple topics in a single Kafka-Connector S3 sink connector? 分布式官方 Mongodb Kafka Source Connector with Multiple tasks 不工作 - Distributed Official Mongodb Kafka Source Connector with Multiple tasks Not working 我们如何将多个集合映射到 mongodb-sink-connector 中的多个主题? - How do we map multiple collections to multiple topics in the mongodb-sink-connector? MongoDB Atlas Source Connector 单主题 - MongoDB Atlas Source Connector Single Topic 如何在Kafka Connect JDBC Source Connector和多个表中使用Single Message Transforms? - How to use Single Message Transforms with Kafka Connect JDBC Source Connector and multiple tables? 多个 collections mongodb 到 Kafka 主题 - multiple collections mongodb to Kafka topic 如果所有源数据库中的主键相同,我们可以为多个源数据库制作单个 JDBC 接收器连接器吗? - Can we make Single JDBC Sink Connector for multiple source db if primary key is same in all source DB? 如何在 kafka 源连接器属性中配置多个查询和查询特定主题? - How to configure multiple query's and query specific topic in kafka source connector properties?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM