简体   繁体   English

Kafka Streams 是否适合触发记录的批处理?

[英]Are Kafka Streams Appropriate for Triggering Batch Processing of Records?

Context语境

I have three services in place, each of which generate a certain JSON payload ( and take different times to do so ) that is needed to be able to process a message which is the result of combining all three JSON payloads into a single payload.我有三个服务,每个服务都会生成一个特定的 JSON 有效负载(并且需要不同的时间来执行此操作),这需要能够处理一条消息,该消息是将所有三个 JSON 有效负载组合成一个有效负载的结果。 This final payload in turn is to be sent to another Kafka Topic so that it can then be consumed by another service.这个最终的有效载荷又将被发送到另一个 Kafka 主题,以便它可以被另一个服务使用。

Below you can find a diagram that better explains the problem at hand.您可以在下面找到一个图表,可以更好地解释手头的问题。 The information aggregator service receives a request to aggregate information, it sends that request to a Kafka topic so that Service 1, Service 2 and Service 3 consume that request and send their data (JSON Payload) to 3 different Kafka Topics.信息聚合器服务接收到聚合信息的请求,它将该请求发送到 Kafka 主题,以便服务 1、服务 2 和服务 3 使用该请求并将其数据(JSON 有效负载)发送到 3 个不同的 Kafka 主题。

应用程序的架构及其主要组件

The Information Aggregator has to consume the messages from the three services (Which are sent to their respective Kafka Topics at very different times eg Service 1 takes half an hour to respond while service 2 and 3 take under 10 minutes) so that it can generate a final payload (Represented as Aggregated Information ) to send to another Kafka Topic.信息聚合器必须使用来自三个服务的消息(这些消息在非常不同的时间发送到各自的 Kafka 主题,例如服务 1 需要半小时响应,而服务 2 和 3 需要不到 10 分钟),以便它可以生成最终有效载荷(表示为聚合信息)发送到另一个 Kafka 主题。

Research研究

After having researched a lot about Kafka and Kafka Streams, I came across this article that provides some great insights on how this should be elaborated.在对 Kafka 和 Kafka Streams 进行了大量研究之后,我看到了这篇文章,它提供了一些关于如何详细阐述的深刻见解。

In this article, the author consumes messages from a single topic while in my specific use case I must consume from three different topics, wait for each message from each topic with a certain ID to arrive so that I can then signal my process that it can proceed to consume the 3 messages with the same ID in different topics to generate the final message and send that final message to another Kafka topic (Which then another service will consume that message).在本文中,作者使用来自单个主题的消息,而在我的特定用例中,我必须使用来自三个不同主题的消息,等待来自每个主题的具有特定 ID 的每条消息到达,以便我可以向我的进程发出信号,它可以继续使用不同主题中具有相同 ID 的 3 条消息以生成最终消息并将该最终消息发送到另一个 Kafka 主题(然后另一个服务将使用该消息)。

Thought Out Solution深思熟虑的解决方案

My thoughts are that I need to have a Kafka Stream checking all three topics and when it sees that has all the 3 messages available, send a message to a kafka topic called eg TopicEvents from which the Information Aggregator will be consuming and by consuming the message will know exactly which messages to get from which topic, partition and offset and then can proceed to send the final payload to another Kafka Topic.我的想法是,我需要一个 Kafka Stream 检查所有三个主题,当它看到所有 3 条消息都可用时,向名为例如 TopicEvents 的 kafka 主题发送一条消息,信息聚合器将从该主题中消费并消费该消息将确切地知道从哪个主题、分区和偏移量获取哪些消息,然后可以继续将最终有效负载发送到另一个 Kafka 主题。

Questions问题

  • Am I making a very wrong use of Kafka Streams and Batch Processing?我对 Kafka 流和批处理的使用非常错误吗?

  • How can I signal a Stream that all of the messages have arrived so that it can generate the message to place in the TopicEvent so as to signal the Information Aggregator that all the messages in the different topics have arrived and are ready to be consumed?我如何向 Stream 发出所有消息都已到达的信号,以便它可以生成要放置在 TopicEvent 中的消息,从而向信息聚合器发出不同主题中的所有消息已到达并准备好使用的信号?

Sorry for this long post, any pointers that you can provide will be very helpful and thank you in advance很抱歉这篇长篇文章,您可以提供的任何指示都会非常有帮助,并提前感谢您

How can I signal a Stream that all of the messages have arrived如何向 Stream 发出所有消息已到达的信号

You can do this using Streams and joins.您可以使用 Streams 和连接来执行此操作。 Since joins are limited to 2 topics you'll need to do 2 joins to get the event where all 3 have occurred.由于连接仅限于 2 个主题,因此您需要进行 2 次连接才能获得所有 3 个主题都发生的事件。

Join TopicA and TopicB to get the event when A and B have occurred.加入 TopicA 和 TopicB 以获取 A 和 B 发生时的事件。 Join AB with TopicC to get the event where A, B and C occur.将 AB 与 TopicC 连接以获得 A、B 和 C 发生的事件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM