Kafka 流联合分区与交互式查询

Question

I have the following topology:我有以下拓扑：

topology.addSource(WS_CONNECTION_SOURCE, new StringDeserializer(), new WebSocketConnectionEventDeserializer()
            , utilService.getTopicByType(TopicType.CONNECTION_EVENTS_TOPIC))
            .addProcessor(SESSION_PROCESSOR, WSUserSessionProcessor::new, WS_CONNECTION_SOURCE)
            .addStateStore(sessionStoreBuilder, SESSION_PROCESSOR)
            .addSink(WS_STATUS_SINK, utilService.getTopicByType(TopicType.ONLINE_STATUS_TOPIC),
                    stringSerializer, stringSerializer
                    , SESSION_PROCESSOR)

            //WS session routing
            .addSource(WS_NOTIFICATIONS_SOURCE, new StringDeserializer(), new StringDeserializer(),
                    utilService.getTopicByType(TopicType.NOTIFICATION_TOPIC))
            .addProcessor(WS_NOTIFICATIONS_ROUTE_PROCESSOR, SessionRoutingEventGenerator::new,
                    WS_NOTIFICATIONS_SOURCE)
            .addSink(WS_NOTIFICATIONS_DELIVERY_SINK, new NodeTopicNameExtractor(), WS_NOTIFICATIONS_ROUTE_PROCESSOR)
            .addStateStore(userConnectedNodesStoreBuilder, WS_NOTIFICATIONS_ROUTE_PROCESSOR, SESSION_PROCESSOR);

As you can see there are 2 source topics.如您所见，有 2 个源主题。 State store is built from the first topic and the second flow reads the state store.状态存储是从第一个主题构建的，第二个流读取状态存储。 When I start the topology, I see those stream threads are assigned the same partitions (co-partitioning) of both source topics.当我启动拓扑时，我看到这些流线程被分配了两个源主题的相同分区（共同分区）。 I assume this is because the state store is accessed by the second topic flow.我认为这是因为第二个主题流访问了状态存储。

This is functionally working fine.这在功能上工作正常。 But there is a performance problem.但是有一个性能问题。 When there is a surge in the volume of input data to the first source topic, which updates state-store, second topic processing is delayed.当第一个源主题的输入数据量激增时，更新状态存储，第二个主题处理被延迟。

For me, the second topic should be processed as fast as possible.对我来说，第二个话题应该尽快处理。 Delay in processing the first topic is fine.延迟处理第一个主题是可以的。

I am thinking of the following strategy:我正在考虑以下策略：

Current configuration:
     WS_CONNECTION_SOURCE - 30 partitions
     WS_NOTIFICATIONS_SOURCE - 30 partitions
     streamThreads: 10
     appInstances: 3 

New configuration:
    WS_CONNECTION_SOURCE - 15 partitions
    WS_NOTIFICATIONS_SOURCE - 30 partitions
    streamThreads: 10
    appInstances: 3
    Since there is no co-partitioning, tasks has to use interactive query to access store

The idea is out of 10 threads, 5 threads will only process the second topic which can alleviate the current problem when there is a surge in the first topic.思路是10个线程，5个线程只处理第二个主题，可以缓解当前第一个主题激增时的问题。

Here are my questions:以下是我的问题：

1. Is this strategy correct? To avoid co-partitioning and use interactive query
2. Is there a chance that Kafka will assign 10 partitions of WS_CONNECTION_SOURCE 
   to one instance since there are 10 threads and one instance won't get any?
3. Is there any better approach to solve the performance problem?

Answer 1

State store and Interactive Query are Kafka Streams abstraction.状态存储和交互式查询是 Kafka Streams 抽象。 To use Interactive Query you have to define state store (using Kafka Streams API) and that enforce you to have same number of partitions, for inputs topics.要使用交互式查询，您必须定义状态存储（使用 Kafka Streams API）并强制您对输入主题具有相同数量的分区。 I think your solution will not work.我认为您的解决方案不起作用。 Interactive query are for exposing ability to query state store outside the Kafka Streams (not for access within Processor API)交互式查询用于公开在 Kafka Streams 之外查询状态存储的能力（不适用于处理器 API 内的访问）

Maybe you can review your SESSION_PROCESSOR source code and extract more work to Process from the other topology and publish result to intermediate topic and then based on that build that state store.也许您可以查看您的SESSION_PROCESSOR源代码并从其他拓扑中提取更多工作到 Process 并将结果发布到中间主题，然后基于该构建该状态存储。

Additionally:此外：

Currently Kafka Streams doesn't support prioritization for input topics.目前 Kafka Streams 不支持输入主题的优先级。 There is KIP about priorities for Source topic: KIP-349 .有关于源主题优先级的KIP ： KIP-349 。 Unfortunately linked Jira ticket was closed as Won't FIX ( https://issues.apache.org/jira/browse/KAFKA-6690 )不幸的是，链接的 Jira 票已关闭，因为 Won't FIX ( https://issues.apache.org/jira/browse/KAFKA-6690 )

Kafka 流联合分区与交互式查询

问题描述

1 个解决方案

解决方案1
0 2020-10-05 14:07:12

Kafka 流联合分区与交互式查询

问题描述

1 个解决方案

解决方案1 0 2020-10-05 14:07:12

解决方案1
0 2020-10-05 14:07:12