简体   繁体   English

Kafka 连接 - 每个连接器有多少任务

[英]Kafka connect - how many tasks per connector

As I see from the documentation and other references, it seems the connector will be instantiated with a single task no matter the value defined through the property (tasks.num)正如我从文档和其他参考资料中看到的那样,无论通过属性 (tasks.num) 定义的值如何,似乎连接器都将使用单个任务进行实例化

  1. Whether this property tasks.num will have any impact like in the case of fail over etc..?这个属性tasks.num是否会像故障转移等那样产生影响? Say, if tasks.num is configured with 2 and a jdbc connector is used with a single task and if that task fails and other will take over?比如说,如果tasks.num配置为 2 并且 jdbc 连接器与单个任务一起使用,并且如果该任务失败并且其他任务将接管?
  2. What is the significance of distributed mode in this case, effectively, the connector is created with a single task?在这种情况下,分布式模式的意义是什么,实际上,连接器是用单个任务创建的?

For the source connector, as linked, this is because it uses a single Change Stream cursor .对于链接的源连接器,这是因为它使用单个 Change Stream cursor How exactly do you expect more than one task to not get conflicting information such as read the same data and duplicate it into the topic?您究竟如何期望多个任务不会获得冲突的信息,例如读取相同的数据并将其复制到主题中?

Connect runs sources and sinks. Connect 运行源和接收器。 Many sources only support single tasks, but it depends on their internal threading model;许多源只支持单个任务,但这取决于它们的内部线程 model; for example, you could have one task per collection/table, but if there's only one unified item, such as a change-stream or binlog, then there can only be one task.例如,每个集合/表可以有一个任务,但是如果只有一个统一的项目,例如更改流或 binlog,那么只能有一个任务。 You've mentioned JDBC, however Debezium would be preferred for CDC, if it supports your database.您提到了 JDBC,但是如果 CDC 支持您的数据库,则 Debezium 将是首选。

Distribution is also for fault tolerance, not just scalability.分发也是为了容错,而不仅仅是可扩展性。 Only some exceptions are recoverable and can be restarted on other nodes只有部分异常是可恢复的,可以在其他节点上重启

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM