简体   繁体   English

将多个 collections 与 MongoDB Kafka 连接器一起使用

[英]Use multiple collections with MongoDB Kafka Connector

According with the documentation if you don't provide a value it will read from all collections根据文档,如果您不提供值,它将从所有 collections 中读取

"name of the collection in the database to watch for changes. If not set then all collections will be watched." “数据库中要监视更改的集合的名称。如果未设置,则将监视所有 collections。”

I saw the connector source code and I confirmed this:我看到了连接器源代码并确认了这一点:

https://github.com/mongodb/mongo-kafka/blob/k133/src/main/java/com/mongodb/kafka/connect/source/MongoSourceTask.java#L462 https://github.com/mongodb/mongo-kafka/blob/k133/src/main/java/com/mongodb/kafka/connect/source/MongoSourceTask.java#L462

However if the collection is not provided I got an error like this:但是,如果未提供集合,我会收到如下错误:

ERROR WorkerSourceTask{id=mongo-source-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:186)
org.apache.kafka.connect.errors.ConnectException: com.mongodb.MongoCommandException: Command failed with error 73 (InvalidNamespace): '{aggregate: 1} is not valid for '$changeStream'; a collection is required.' on server localhost:27018. The full response is {"operationTime": {"$timestamp": {"t": 1603928795, "i": 1}}, "ok": 0.0, "errmsg": "{aggregate: 1} is not valid for '$changeStream'; a collection is required.", "code": 73, "codeName": "InvalidNamespace", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1603928795, "i": 1}}, "signature": {"hash": {"$binary": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type": "00"}, "keyId": {"$numberLong": "0"}}}}

This is my configuration file这是我的配置文件

name=mongo-source
connector.class=com.mongodb.kafka.connect.MongoSourceConnector
tasks.max=1

# Connection and source configuration
connection.uri=mongodb://localhost:27017,localhost:27018/order
database=order
collection=

topic.prefix=redemption
poll.max.batch.size=1000
poll.await.time.ms=5000

# Change stream options
pipeline=[]
batch.size=0
change.stream.full.document=updateLookup
collation=
copy.existing=true
errors.tolerance=all

If a collection is used, I'm able to use the connector and generate topics.如果使用集合,我可以使用连接器并生成主题。

Seeing the logs it appears the connector is connecting to the db:查看日志,似乎连接器正在连接到数据库:

INFO Watching for database changes on 'order' (com.mongodb.kafka.connect.source.MongoSourceTask:620)信息观察“订单”上的数据库更改(com.mongodb.kafka.connect.source.MongoSourceTask:620)

Source Code源代码

else if (collection.isEmpty()) {
      LOGGER.info("Watching for database changes on '{}'", database);
      MongoDatabase db = mongoClient.getDatabase(database);
      changeStream = pipeline.map(db::watch).orElse(db.watch());
    } else

If I go to my mongo console, I'm having the following:如果我 go 到我的 mongo 控制台,我有以下内容:

rs0:SECONDARY> db.watch()
2020-10-28T18:13:50.344-0600 E QUERY    [thread1] TypeError: db.watch is not a function :
@(shell):1:1
rs0:SECONDARY> db.watch
test.watch

I was using mongo 3.6 version which supports to watch collections but doesn't support to watch databases or deployments (instances), therefore I was getting those errors.我使用的是 mongo 3.6 版本,它支持观看集合但不支持观看数据库或部署(实例),因此我收到了这些错误。

I found this on the documentation:我在文档中找到了这个:

Starting in MongoDB 4.0, you can open a change stream cursor for a single database (excluding admin, local, and config database) to watch for changes to all its non-system collections.从 MongoDB 4.0 开始,您可以为单个数据库(不包括 admin、local 和 config 数据库)打开更改流游标以监视对其所有非系统集合的更改。

https://docs.mongodb.com/manual/changeStreams/#watch-collection-database-deployment https://docs.mongodb.com/manual/changeStreams/#watch-collection-database-deployment

You can listen to multiple change streams from multiple mongo collections.您可以收听来自多个 mongo collections 的多个更改流 You just need to provide the Regex for the collection names in pipeline , you can even provide the Regex for database names if you have multiple databases to listen to.您只需要为pipeline中的集合名称提供正则表达式,如果您有多个要收听的数据库,您甚至可以为数据库名称提供正则表达式。

"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collections_.*/}}]}}]"  

You can even exclude any given database using $nin , which you dont want to listen for any change-stream.您甚至可以使用$nin排除任何给定的数据库,您不想监听任何更改流。

"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/,\"$nin\":[/^any_database_name$/]}},{\"ns.coll\":{\"$regex\":/^collections_.*/}}]}}]"

Here is the complete Kafka connector configuration.这是完整的 Kafka 连接器配置。

Mongo to Kafka source connector Mongo 到 Kafka 源连接器

{
  "name": "mongo-to-kafka-connect",
  "config": {
    "connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
    "publish.full.document.only": "true",
    "tasks.max": "3",
    "key.converter.schemas.enable": "false",
    "topic.creation.enable": "true",
    "poll.await.time.ms": 1000,
    "poll.max.batch.size": 100,
    "topic.prefix": "any prefix for topic name",
    "output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
    "connection.uri": "mongodb://<username>:<password>@ip:27017,ip:27017,ip:27017,ip:27017/?authSource=admin&replicaSet=xyz&tls=true",
    "value.converter.schemas.enable": "false",
    "copy.existing": "true",
    "topic.creation.default.replication.factor": 3,
    "topic.creation.default.partitions": 3,
    "topic.creation.compacted.cleanup.policy": "compact",
    "value.converter": "org.apache.kafka.connect.storage.StringConverter",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "mongo.errors.log.enable": "true",
    "heartbeat.interval.ms": 10000,
    "pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-name$/}},{\"ns.coll\":{\"$regex\":/^collections_.*/}}]}}]"
  }
}

You can get more details from official docs.您可以从官方文档中获得更多详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM