简体   繁体   English

如何根据 kafka 主题名称或消息键/值使用 mongodb 接收器连接器对不同 dbs 和 collections 中的 kafka 主题进行分组

[英]How to group kafka topics in different dbs and collections with mongodb sink connector depending on kafka topic name or message key/value

As the title states, I'm using debezium Postgres source connector and I would like MongoDB sink connector to group kafka topics in different collection and databases (also different dbs to isolate unrelated data) according to their names.正如标题所述,我正在使用 debezium Postgres 源连接器,并且我希望 MongoDB 接收器连接器根据其名称将不同集合和数据库中的 kafka 主题(也用于隔离不相关数据的不同数据库)分组。 While inquiring I came across with topic.regex connector property at mongo docs .在查询时,我在mongo docs遇到了topic.regex连接器属性。 Unfortunately, this only creates a collection in mongo for each kafka topic successfully matched against the specified regex , and I'm planning on using the same mongodb server to harbor many dbs captured from multiple debezium source connectors .不幸的是,这只会在 mongo 中为每个成功匹配指定 regex 的 kafka 主题创建一个集合,我计划使用相同的 mongodb 服务器来容纳从多个 debezium 源连接器捕获的许多 dbs Can you help me?你能帮助我吗?

Note: I read this mongo sink setting FieldPathNamespaceMapper , but I'm not sure if it would fit my needs nor how to correctly configure it.注意:我阅读了这个 mongo sink 设置FieldPathNamespaceMapper ,但我不确定它是否符合我的需要,也不确定如何正确配置它。

topics.regex is a general sink connector peppery, not unique to Mongo. topics.regex是一个通用的 sink 连接器,不是 Mongo 独有的。

If I understand the problem, correctly, obviously only collections will get created in the configured database for Kafka topics that actually exist (match the pattern) and get consumed by the sink.如果我正确理解了这个问题,显然只有 collections 会在配置的数据库中为实际存在的 Kafka 主题创建(匹配模式)并被接收器消耗。

If you want collections that don't match a pattern, then you'll still need to consume them, but need to explicitly rename the topics via RegexRouter transform before records are written to Mongo如果您想要不匹配模式的 collections,那么您仍然需要使用它们,但需要在将记录写入 Mongo 之前通过 RegexRouter 转换显式重命名主题

In kafka workers are simple containers that can run multiple connectors.在 kafka 中,worker 是可以运行多个连接器的简单容器。 For each connector workers generate tasks according to internal rules and your configurations.对于每个连接器,工作人员根据内部规则和您的配置生成任务。 So, if you take a look at mongodb sink connector configurations:因此,如果您查看 mongodb 接收器连接器配置:

https://www.mongodb.com/docs/kafka-connector/current/sink-connector/configuration-properties/all-properties/ https://www.mongodb.com/docs/kafka-connector/current/sink-connector/configuration-properties/all-properties/

You can create different connectors with the same connection.uri, database and collection, or different values.您可以使用相同的 connection.uri、数据库和集合或不同的值创建不同的连接器。 So you might use the topics.regex or topics parameters to group the topics for a single connector with its own connection.uri, database and collection, and run multiple connectors at the same time.因此,您可以使用topics.regex 或topics 参数将单个连接器的主题与它自己的connection.uri、数据库和集合进行分组,并同时运行多个连接器。 Remember that if tasks.max > 1 in your connector, messages might be read out of order.请记住,如果您的连接器中的 tasks.max > 1,则可能会乱序读取消息。 If this is not a problem, set a value of tasks.max next to the number of mongodb shards.如果这不是问题,请在 mongodb 分片数量旁边设置一个 tasks.max 值。 The worker will adjust the number of tasks automatically.工作人员会自动调整任务数量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM