用于读取 mongoDB 集群中的多个 collections 的单个或多个源 kaka 连接器

Question

I want to know if it is recommended to create multiple Kafka connectors for streaming multiple collections data available in the same database or different databases within the same MongoDB cluster.我想知道是否建议创建多个 Kafka 连接器来流式传输同一数据库或同一 MongoDB 集群中的不同数据库中可用的多个 collections 数据。

I think there will be only one oplog per cluster.我认为每个集群只有一个 oplog。 So it is easy to read the data for multiple collections and this approach will put less load on the cluster.因此很容易读取多个 collections 的数据，这种方法将减少集群的负载。 But I am not sure how easy it will be to put the data on different Kafka topics per collection.但我不确定将数据放在每个集合的不同 Kafka 主题上会有多容易。 While in the second approach of creating multiple connectors.而在创建多个连接器的第二种方法中。 I feel like it is going to put too much load on the server.我觉得这会给服务器带来太多的负担。

Please suggest what is the recommended approach.请建议推荐的方法是什么。

Answer 1

You can listen to multiple change streams from multiple mongo collections, you just need to provide the suitable Regex for the collection names in pipeline .您可以监听来自多个 mongo collections 的多个更改流，您只需为pipeline中的集合名称提供合适的正则表达式。 You can even exclude collection/collections by providing the Regex from where you don't want to listen to any change streams.您甚至可以通过提供您不想收听任何更改流的正则表达式来排除集合/集合。

"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"

You can even exclude any given database using $nin , which you don't want to listen for any change-stream.您甚至可以使用$nin排除任何给定的数据库，您不想监听任何更改流。

"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/,\"$nin\":[/^any_database_name$/]}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"

Coming to your questions:来回答你的问题：

From my point of view it will be an overhead to create an N number of Kafka connectors for each collection, rather I would recommend creating a single Kafka connector.从我的角度来看，为每个集合创建N个 Kafka 连接器将是一种开销，我建议创建单个 Kafka 连接器。 Make sure you provide fault tolerance using recommended configurations, just don't rely on a default configuration of connector.确保您使用推荐的配置提供容错，只是不要依赖连接器的默认配置。

Here is the basic Kafka connector configuration.这是基本的 Kafka 连接器配置。

Mongo to Kafka source connector Mongo 到 Kafka 源连接器

{
  "name": "mongo-to-kafka-connect",
  "config": {
    "connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
    "publish.full.document.only": "true",
    "tasks.max": "3",
    "key.converter.schemas.enable": "false",
    "topic.creation.enable": "true",
    "poll.await.time.ms": 1000,
    "poll.max.batch.size": 100,
    "topic.prefix": "any prefix for topic name",
    "output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
    "connection.uri": "mongodb://<username>:<password>@ip:27017,ip:27017,ip:27017,ip:27017/?authSource=admin&replicaSet=xyz&tls=true",
    "value.converter.schemas.enable": "false",
    "copy.existing": "true",
    "topic.creation.default.replication.factor": 3,
    "topic.creation.default.partitions": 3,
    "topic.creation.compacted.cleanup.policy": "compact",
    "value.converter": "org.apache.kafka.connect.storage.StringConverter",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "mongo.errors.log.enable": "true",
    "heartbeat.interval.ms": 10000,
    "pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collection_.*/}}]}}]"
  }
}

You can get more details from official docs.您可以从官方文档中获得更多详细信息。

Mongo docs: https://www.mongodb.com/docs/kafka-connector/current/source-connector/ Mongo 文档： https://www.mongodb.com/docs/kafka-connector/current/source-connector/
Confluent docs: https://docs.confluent.io/platform/current/connect/index.html汇合文档： https://docs.confluent.io/platform/current/connect/index.html
Regex: https://www.mongodb.com/docs/manual/reference/operator/query/regex/#mongodb-query-op.-regex正则表达式： https://www.mongodb.com/docs/manual/reference/operator/query/regex/#mongodb-query-op.-regex
Configuration-properties: https://www.mongodb.com/docs/kafka-connector/current/source-connector/configuration-properties/配置属性： https://www.mongodb.com/docs/kafka-connector/current/source-connector/configuration-properties/

用于读取 mongoDB 集群中的多个 collections 的单个或多个源 kaka 连接器

问题描述

1 个解决方案

解决方案1
0 2022-08-14 15:12:02

Coming to your questions:来回答你的问题：

用于读取 mongoDB 集群中的多个 collections 的单个或多个源 kaka 连接器

问题描述

1 个解决方案

解决方案1 0 2022-08-14 15:12:02

Coming to your questions:来回答你的问题：

解决方案1
0 2022-08-14 15:12:02