简体   繁体   English

Kafka Connect 中的连接器和任务之间有什么关系?

[英]What is the relationship between connectors and tasks in Kafka Connect?

We've been using Kafka Connect for a while on a project, currently entirely using only theConfluent Kafka Connect JDBC connector .我们在一个项目中使用 Kafka Connect 已经有一段时间了,目前完全只使用Confluent Kafka Connect JDBC 连接器 I'm struggling to understand the role of 'tasks' in Kafka Connect, and specifically with this connector.我正在努力理解“任务”在 Kafka Connect 中的作用,特别是使用这个连接器。 I understand 'connectors';我理解“连接器”; they encompass a bunch of configuration about a particular source/sink and the topics they connect from/to.它们包含一堆关于特定源/接收器的配置以及它们连接/连接的主题。 I understand that there's a 1:Many relationship between connectors and tasks, and the general principle that tasks are used to parallelize work.我了解连接器和任务之间存在1:Many关系,以及任务用于并行化工作的一般原则。 However, how can we understand when a connector will/might create multiple tasks?但是,我们如何理解连接器何时会/可能会创建多个任务?

  • In the source connector case, we are using the JDBC connector to pick up source data by timestamp and/or a primary key, and so this seems in its very nature sequential.在源连接器的情况下,我们使用 JDBC 连接器通过时间戳和/或主键来获取源数据,因此这看起来本质上是顺序的。 Indeed, all of our source connectors only ever seem to have one task.事实上,我们所有的源连接器似乎都只有一项任务。 What would ever trigger Kafka Connect to create more than one connector?什么会触发 Kafka Connect 创建多个连接器? Currently we are running Kafka Connect in distributed mode , but only with one worker;目前我们在分布式模式下运行 Kafka Connect,但只有一名工作人员; if we had multiple workers, might we get multiple tasks per connector, or are the two not related?如果我们有多个工作人员,每个连接器是否会获得多个任务,或者两者不相关?

  • In the sink connector case, we are explicitly configuring each of our sink connectors with tasks.max=1 , and so unsurprisingly we only ever see one task for each connector there too.在 sink 连接器案例中,我们使用tasks.max=1显式配置每个 sink 连接器,因此不出所料,我们也只看到每个连接器的一个任务。 If we removed that configuration, presumably we could/would get more than one task.如果我们删除该配置,大概我们可以/将获得不止一项任务。 Would this mean the messages on our input topic might be consumed out of sequence?这是否意味着我们输入主题上的消息可能会被乱序消费? In which case, how is data consistency for changes assured?在这种情况下,如何保证更改的数据一致性?

Also, from time to time, we have seen situations where a single connector and task will both enter the FAILED state (because of input connectivity issues).此外,我们有时会看到单个连接器和任务都将进入 FAILED state 的情况(由于输入连接问题)。 Restarting the task will remove it from this state, and restart the flow of data, but the connector remains in FAILED state.重新启动任务会将其从 state 中删除,并重新启动数据流,但连接器仍处于 FAILED state 中。 How can this be - isn't the connector's state just the aggregate of all its child tasks?这怎么可能——连接器的 state 不只是其所有子任务的总和吗?

A task is a thread that performs the actual sourcing or sinking of data.任务是执行数据的实际来源或接收的线程。

The number of tasks per connector is determined by the implementation of the connector.每个连接器的任务数由连接器的实现决定。 Take a Debezium source connector to MySQL as an example, since one MySQL instance writes to exactly one binlog file at a time and a file has to be read sequentially, one connector generates exactly one task.Debezium源连接器到 MySQL 为例,由于一个 MySQL 实例一次只写入一个 binlog 文件,并且必须顺序读取一个文件,因此一个连接器只生成一个任务。

Whereas for sink connectors, the number of tasks should be equal to the number of partitions of the topic.而对于接收器连接器,任务数应等于主题的分区数。

The task distribution among workers is determined by task rebalance which is a very similar process to Kafka consumer group rebalance.工作人员之间的任务分配由任务重新平衡决定,这与 Kafka 消费者组重新平衡非常相似。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 ksqldb中的Connector和Streams是什么关系? - What's the relationship between Connectors and Streams in ksqldb? Java 中的 Kafka Connect 动态连接器 - Kafka Connect Dynamic connectors in Java Kafka connect 2.0.0 - 无法在集群模式下创建连接器 - 错误:请求领导重新配置连接器任务失败 - Kafka connect 2.0.0 - unable to create connectors in cluster mode - Error: Request to leader to reconfigure connector tasks failed Kafka connect 和 Confluent Kafka 有什么区别? - What is the difference between Kafka connect and Confluent Kafka? kafka connect 7.1.1不加载手动安装的连接器 - Manually installed connectors are not loded by kafka connect 7.1.1 Kafka Connect 经常检查新连接器 - Kafka Connect frequently check for new connectors 具有多个连接器和一个主题的分布式Kafka Connect - Distributed Kafka Connect with multiple Connectors and one Topic 将 Kafka 连接 MongoDB 连接器注册为 SCDF 中的应用程序 - Register Kafka connect MongoDB connectors as apps in SCDF Kafka Connect 和 Kafka Streams 提交间隔之间是否存在关系? - Is there a relationship between Kafka Connect and Kafka Streams commit interval? 如何在 Kubernetes 上为 Kafka-connect 创建连接器? - How to create connectors for Kafka-connect on Kubernetes?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM