简体   繁体   English

kafka-connect jdbc分布式模式

[英]kafka-connect jdbc distributed mode

We are working on building the Kafka-connect application using JDBC source connector in increment+timestamp mode. 我们正在使用增量+时间戳模式下的JDBC源连接器来构建Kafka-connect应用程序。 We tried the Standalone mode and It is working as expected. 我们尝试了独立模式,它正在按预期方式工作。 Now, we would like to switch to distributed mode. 现在,我们想切换到分布式模式。

When we have a single Hive table as a source, How the tasks will be distributed among the workers? 当我们有一个单独的Hive表作为源时,如何在工作人员之间分配任务?

The problem we faced was when we run the application in multiple instances, It is querying the table for every instance and fetching the same rows again. 我们面临的问题是当我们在多个实例中运行应用程序时,它正在为每个实例查询表并再次获取相同的行。 Does parallelism will work in this case? 在这种情况下并行性会起作用吗? If so, 如果是这样的话,
How does the tasks will co-ordinate with each other on the current status of table ? 任务如何根据表的当前状态相互协调?

The parameter tasks.max doesn't have any difference for the kafka-connect-jdbc source/sink connector. 参数tasks.max与kafka-connect-jdbc源/接收器连接器没有任何区别。 There is no occurrence of this property in the source code of the jdbc connector project. jdbc连接器项目的源代码中没有此属性。

Consult JDBC source config options for the available properties for this connector. 有关此连接器的可用属性,请查阅JDBC源配置选项

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM