简体   繁体   English

Kafka Streams GlobalKTable 主题是否需要与它将加入的 KStream 主题相同数量的分区?

[英]Does Kafka Streams GlobalKTable topic require the same number of partitions as KStream topic which it will be joining with?

We want to use GlobalKTable in Kafka streams application.我们想在 Kafka 流应用程序中使用 GlobalKTable。 Input topics(KTable/KStream) have N partitions and a GlobalKTable will be used as a dictionary in the stream application.输入主题(KTable/KStream)有 N 个分区,并且 GlobalKTable 将用作 stream 应用程序中的字典。

Does the input topic for the GlobalKTable must have the same number of partitions as other input topics (which are sources of KTable/KStream)? GlobalKTable 的输入主题是否必须与其他输入主题(KTable/KStream 的来源)具有相同数量的分区

As I understand, the answer is NO(it is not limited and the topic may also have M partitions where N > M), because GlobalKTable is fully loaded in each instance of the stream application and the co-partitioning is not required during KStream join operation.据我了解,答案是否定的(不受限制,主题也可能有 M 个分区,其中 N > M),因为 GlobalKTable 在 stream 应用程序的每个实例中都已完全加载,并且在 KStream 加入期间不需要共同分区手术。 But I need confirmation from the experts!但我需要专家的确认!

Thank you!谢谢!

No, The number of partitions for topics for KStream and GlobalTable (that are join) can differ.不,KStream 和 GlobalTable(连接)的主题分区数可能不同。

From Kafka Streams developer guide来自 Kafka Streams 开发者指南

At a high-level, KStream-GlobalKTable joins are very similar to KStream-KTable joins.在高层次上,KStream-GlobalKTable 连接与 KStream-KTable 连接非常相似。 However, global tables provide you with much more flexibility at the some expense when compared to partitioned tables:但是,与分区表相比,全局表以一定的代价为您提供了更大的灵活性:

  • They do not require data co-partitioning.它们不需要数据共同分区。

More details can be found here:更多详情可在这找到:

Global Table join 全局表连接

Join co-partitioning requirements 加入共分区要求

More accurately:更精确地:

Why is data co-partitioning required?为什么需要数据共同分区? Because KStream-KStream, KTable-KTable, and KStream-KTable joins are performed based on the keys of records (eg, leftRecord.key == rightRecord.key), it is required that the input streams/tables of a join are co-partitioned by key.因为 KStream-KStream、KTable-KTable 和 KStream-KTable 连接是基于记录的键执行的(例如,leftRecord.key == rightRecord.key),所以要求连接的输入流/表是共同的键分区。

The only exception are KStream-GlobalKTable joins.唯一的例外是 KStream-GlobalKTable 连接。 Here, co-partitioning is it not required because all partitions of the GlobalKTable's underlying changelog stream are made available to each KafkaStreams instance, ie each instance has a full copy of the changelog stream.在这里,不需要共同分区,因为 GlobalKTable 的底层变更日志 stream 的所有分区都可用于每个 KafkaStreams 实例,即每个实例都有变更日志 stream 的完整副本。 Further, a KeyValueMapper allows for non-key based joins from the KStream to the GlobalKTable.此外,KeyValueMapper 允许从 KStream 到 GlobalKTable 的非基于键的连接。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM