简体繁体 English

快处理主题和慢处理主题 - Akka Kafka

[英]Fast Processing Topic and Slow Processing Topic - Akka Kafka

原文 2021-09-22 21:12:15 2 1 java/ scala/ apache-kafka/ akka-stream/ akka-kafka

I have a problem where i need to prioritize some events to be processes earlier and some events lets say after the high priority events.我有一个问题，我需要优先处理一些事件，以便更早处理一些事件，比如在高优先级事件之后。 Those events come from one source and i need to prioritize the streams depending on their event type priority to be either forwarded in the high priority or lower priority sink.这些事件来自一个来源，我需要根据事件类型优先级对流进行优先级排序，以便在高优先级或低优先级接收器中转发。 I'm using kafka and akka kafka streams.我正在使用 kafka 和 akka kafka 流。 So the main problem is i get a lot of traffic at a given point in time.所以主要问题是我在给定的时间点获得大量流量。 What would here be the preferred scenario?这里的首选方案是什么？

1 个解决方案

The first thing to tackle is the offset commit.首先要解决的是偏移提交。 Because processing will not be in order, committing offsets after processing cannot guarantee at-least-once (nor can it guarantee at-most-once), because the following sequence is possible (and the probability of this cannot be reduced to zero):因为处理不会按顺序进行，处理后commit offsets不能保证at-least-once（也不能保证at-most-once），因为下面的顺序是可能的（而且这个概率不能降为零）：

Commit offset for high-priority message which has been processed before multiple low-priority messages have been processed在处理多个低优先级消息之前处理的高优先级消息的提交偏移量
Stream fails (or instance running the stream is stopped, or whatever) Stream 失败（或运行 stream 的实例停止，或其他）
Stream restarts from last committed offset Stream 从上次提交的偏移量重新开始
The low-priority messages are never read from Kafka again, so never get processed低优先级的消息再也不会从 Kafka 读取，所以永远不会被处理

This then suggests that either the offset commit will have to happen before the reordering or we'll need a notion of processed-but-not-yet-committable until the low-priority messages have been processed.这表明要么偏移量提交必须在重新排序之前发生，要么我们需要一个已处理但尚未提交的概念，直到处理完低优先级消息。 Noting that for the latter option, tracking the greatest offset not committed (the simplest strategy which could possibly work) will not work if there's anything which could create gaps in the offset sequence which implies infinite retention and no compaction, I'd actually suggest committing the offsets before processing, but once the processing logic has guaranteed that it will eventually process the message.请注意，对于后一种选择，跟踪未提交的最大偏移量（可能有效的最简单策略）将不起作用，如果有任何可能在偏移序列中产生间隙这意味着无限保留和不压缩，我实际上建议提交处理前的偏移量，但是一旦处理逻辑保证它最终会处理消息。

A combination of actors and Akka Persistence allows this approach to be taken.演员和 Akka 持久性的组合允许采用这种方法。 The rough outline is to have an actor which is persistent (this is a good fit for event-sourcing) and basically maintains lists of high-priority and low-priority messages to process.粗略的概述是拥有一个持久的参与者（这非常适合事件溯源）并且基本上维护要处理的高优先级和低优先级消息列表。 The stream sends an "ask" with the message from Kafka to the actor, which on receipt classifies the message as high-/low-priority, assuming that the message hasn't already been processed. stream 从 Kafka 向 actor 发送带有消息的“询问”，actor 在收到消息时将消息分类为高/低优先级，假设该消息尚未被处理。 The message (and perhaps its classification) is persisted as an event and the actor acknowledges receipt of the message and that it commits to processing it by scheduling a message to itself to fully process a "to-process" message.消息（可能还有它的分类）作为事件持久化，参与者确认收到消息，并承诺通过为自己安排消息来完全处理“待处理”消息来处理它。 The acknowledgement completes the ask, allowing the offset to be committed to Kafka.确认完成请求，允许将偏移量提交给 Kafka。 On receipt of the message (a command, really) to process a message, the actor chooses the Kafka message to process (by priority, age, etc.) and persists that it's processed that message (thus moving it from "to-process" to "processed") and potentially also persists an event updating state relevant to how it interprets Kafka messages.在收到消息（实际上是命令）以处理消息时，参与者选择要处理的 Kafka 消息（按优先级、年龄等）并坚持认为它已处理该消息（因此将其从“待处理”中移出到“已处理”）并且可能还会持续更新与它如何解释 Kafka 消息相关的事件更新 state。 After this persistence, the actor sends another command to itself to process a "to-process" message.在这种持久化之后，actor 向自己发送另一个命令来处理“to-process”消息。

Fault-tolerance is then achieved by having a background process periodically pinging this actor with the "process a to-process message" command.然后，通过让后台进程使用“处理一条到进程的消息”命令定期对该参与者执行 ping 操作，可以实现容错。

As with the stream, this is a single-logical-thread-per-partition process.与 stream 一样，这是一个每个分区的单个逻辑线程进程。 It's possible that you are multiplexing many partitions worth of state per physical Kafka partition, in which case you can have multiple of these actors and send multiple asks from the ingest stream. If doing this, the periodic ping is likely best accomplished by a stream fed by an Akka Persistence Query to get the identifiers of all the persistent actors.您可能正在为每个物理 Kafka 分区多路复用价值 state 的许多分区，在这种情况下，您可以拥有多个这样的参与者并从摄取 stream 发送多个请求。如果这样做，周期性 ping 可能最好由 stream fed 完成通过 Akka 持久性查询来获取所有持久性参与者的标识符。

Note that the reordering in this problem makes it fundamentally a race and thus non-deterministic: in this design sketch, the race is because for messages M1 from actor B and M2 from actor C sent to actor A may be received in any order (if actor B sent a message M3 to actor A after it sent message M1, M3 would arrive after M1 but could arrive before or after M2).请注意，此问题中的重新排序从根本上使它成为一场竞赛，因此是不确定的：在此设计草图中，竞赛是因为对于来自演员 B 的消息 M1 和来自演员 C 的消息 M2 发送给演员 A 可能以任何顺序接收（如果actor B 在发送消息 M1 后向 actor A 发送消息 M3，M3 将在 M1 之后到达，但可以在 M2 之前或之后到达）。 In a different design, the race could occur based on speed of processing relative to the latency for Kafka to make a message available for consumption.在不同的设计中，竞争可能基于相对于 Kafka 使消息可供消费的延迟的处理速度。