简体繁体 English

需要排程 MongoDB kafka connect

[英]Need to schedule MongoDB kafka connect

原文 2022-09-23 13:54:57 6 2 apache-kafka/ scheduled-tasks/ apache-kafka-connect/ mongodb-kafka-connector

We are working with mongodb kafka connetor on top of open source Apache Kafka connector, for data ingestion of json data from Mongo to HDFS. We are working with mongodb kafka connetor on top of open source Apache Kafka connector, for data ingestion of json data from Mongo to HDFS. We have kafka consumer which reads data changes in kafka and writes them on hdfs file.我们有 kafka 消费者，它读取 kafka 中的数据更改并将它们写入 hdfs 文件。

We want to schedule source connectors at specific time different time.我们希望在不同时间的特定时间安排源连接器。

We need to trigger kafka message based on a scheduled date.我们需要根据预定日期触发 kafka 消息。

2 个解决方案

We can handle this scenario using source connector's configuration properties from confluent with customise the polling interval我们可以使用来自 confluent 的源连接器的配置属性来处理这种情况，并自定义轮询间隔

link:关联：

https://www.mongodb.com/docs/kafka-connector/current/source-connector/configuration-properties/all-properties/#std-label-source-configuration-all-properties https://www.mongodb.com/docs/kafka-connector/current/source-connector/configuration-properties/all-properties/#std-label-source-configuration-all-properties

==> poll.await.time.ms can be a solution ==> poll.await.time.ms 可以是一个解决方案

Otherwise, there is Kafka message scheduler:否则，有 Kafka 消息调度器：

https://github.com/etf1/kafka-message-scheduler https://github.com/etf1/kafka-message-scheduler

Automatically Consume Data From Kafka with the Scheduler使用调度程序自动使用来自 Kafka 的数据

When you create a new scheduler, the vkconfig script takes the following steps:创建新调度程序时，vkconfig 脚本会执行以下步骤：

Creates a new Vertica schema using the name you specified for the scheduler.使用您为调度程序指定的名称创建新的 Vertica 架构。 You use this name to identify the scheduler during configuration.您可以在配置期间使用此名称来标识调度程序。

Creates the tables needed to manage the Kafka data load in the newly-created schema.在新创建的模式中创建管理 Kafka 数据负载所需的表。

from MongoDB Kafka connect official documentation:来自MongoDB Kafka连接官方文档：

https://www.mongodb.com/docs/kafka-connector/current/source-connector/configuration-properties/all-properties/#change-streams https://www.mongodb.com/docs/kafka-connector/current/source-connector/configuration-properties/all-properties/#change-streams

Use the following configuration settings to specify aggregation pipelines for change streams and read preferences for change stream cursors.使用以下配置设置来指定更改流的聚合管道和更改 stream 游标的读取首选项。

poll.await.time.ms == The amount of time in milliseconds to wait before checking the change stream cursor for new results. poll.await.time.ms == 检查更改 stream cursor 以获取新结果之前等待的时间量（以毫秒为单位）。

or use: poll.max.batch.size == Maximum number of documents to read in a single batch when polling a change stream cursor for new data.或使用： poll.max.batch.size == 轮询更改 stream cursor 以获取新数据时单个批次中读取的最大文档数。 You can use this setting to limit the amount of data buffered internally in the connector.您可以使用此设置来限制连接器内部缓冲的数据量。