当我们的 Kafka 分区中存在滞后时，Akka Kafka 消费者处理率会急剧下降

Question

We are facing a scenario where our akka-stream-kaka-consumer processing rate is decreasing whenever there is a lag.我们正面临这样一种情况：只要有延迟，我们的 akka-stream-kaka-consumer 处理率就会下降。 When we start it without any lag in partitions, processing rate increases suddenly.当我们在没有任何分区延迟的情况下启动它时，处理速度会突然增加。

MSK cluster - 10 topics - 40 partitions each => 400 total leader partitions MSK 集群 - 10 个主题 - 每个 40 个分区 => 400 个领导分区

To achieve high throughput and parallelism in system we implemented akka-stream-kafka consumers subscribing to each topic-partition separately resulting in 1:1 mapping between consumer and partition.为了在系统中实现高吞吐量和并行性，我们实现了 akka-stream-kafka 消费者分别订阅每个主题分区，从而在消费者和分区之间实现 1:1 映射。

Here is consumer setup:这是消费者设置：

Number of ec2 service instances - 7 ec2 服务实例数 - 7
Each service spins up 6 consumer for each of the 10 topics resulting resulting in 60 consumers from each service instance.每个服务为 10 个主题中的每一个启动 6 个消费者，从而导致每个服务实例有 60 个消费者。
Total consumer = Number of instances (7) * Number of consumers on each service instance (60) = 420消费者总数 = 实例数 (7) * 每个服务实例上的消费者数 (60) = 420

So, in total we are starting 420 consumers spread across different instances.因此，我们总共启动了 420 个消费者，分布在不同的实例中。 As per the RangeAssignor Partition strategy (Default one), each partition will get assigned to different consumer and 400 consumer will use 400 partitions and 20 consumers will remain unused.根据 RangeAssignor 分区策略（默认），每个分区将分配给不同的消费者，400 个消费者将使用 400 个分区，20 个消费者将保持未使用。 We have verified this allocation and looks good.我们已经验证了这个分配，看起来不错。

Instance Type used: c5.xlarge使用的实例类型： c5.xlarge

MSK Config: MSK 配置：

Apache Kafka version - 2.4.1.1 Apache Kafka 版本- 2.4.1.1

Total number of brokers - 9 ( spread across 3 AZs)经纪人总数- 9（分布在 3 个可用区）

Broker Type: kafka.m5.large经纪商类型： kafka.m5.large

Broker per Zone: 3每个区域的经纪人： 3

auto.create.topics.enable =true auto.create.topics.enable =true

default.replication.factor =3 default.replication.factor =3

min.insync.replicas =2 min.insync.replicas =2

num.io.threads =8数量.io.threads = 8

num.network.threads =5 num.network.threads = 5

num.partitions =40 num.partitions = 40

num.replica.fetchers =2 num.replica.fetchers =2

replica.lag.time.max.ms =30000 replica.lag.time.max.ms =30000

socket.receive.buffer.bytes =102400 socket.receive.buffer.bytes =102400

socket.request.max.bytes =104857600 socket.request.max.bytes =104857600

socket.send.buffer.bytes =102400 socket.send.buffer.bytes =102400

unclean.leader.election.enable =true unclean.leader.election.enable =true

zookeeper.session.timeout.ms =18000 zookeeper.session.timeout.ms =18000

log.retention.ms =259200000 log.retention.ms =259200000

This is the configuration we are using for each consumers这是我们为每个消费者使用的配置

akka.kafka.consumer {
 kafka-clients {
  bootstrap.servers = "localhost:9092"
  client.id = "consumer1"
  group.id = "consumer1"
  auto.offset.reset="latest"
 }
 aws.glue.registry.name="Registry1"
 aws.glue.avroRecordType = "GENERIC_RECORD"
 aws.glue.region = "region"
 

    kafka.value.deserializer.class="com.amazonaws.services.schemaregistry.deserializers.avro.AWSKafkaAvroDeserializer"

 # Settings for checking the connection to the Kafka broker. Connection checking uses `listTopics` requests with the timeout
 # configured by `consumer.metadata-request-timeout`
 connection-checker {

  #Flag to turn on connection checker
  enable = true

  # Amount of attempts to be performed after a first connection failure occurs
  # Required, non-negative integer
  max-retries = 3

  # Interval for the connection check. Used as the base for exponential retry.
  check-interval = 15s

  # Check interval multiplier for backoff interval
  # Required, positive number
  backoff-factor = 2.0
 }
}

akka.kafka.committer {

 # Maximum number of messages in a single commit batch
 max-batch = 10000

 # Maximum interval between commits
 max-interval = 5s

 # Parallelism for async committing
 parallelism = 1500

 # API may change.
 # Delivery of commits to the internal actor
 # WaitForAck: Expect replies for commits, and backpressure the stream if replies do not arrive.
 # SendAndForget: Send off commits to the internal actor without expecting replies (experimental feature since 1.1)
 delivery = WaitForAck

 # API may change.
 # Controls when a `Committable` message is queued to be committed.
 # OffsetFirstObserved: When the offset of a message has been successfully produced.
 # NextOffsetObserved: When the next offset is observed.
 when = OffsetFirstObserved
}


akka.http {
 client {
  idle-timeout = 10s
 }
 host-connection-pool {
  idle-timeout = 10s
  client {
   idle-timeout = 10s
  }
 }
}

consumer.parallelism=1500

We are using below code to to materialised the flow from Kafka to empty sink我们使用下面的代码来实现从 Kafka 到空接收器的流

override implicit val actorSystem = ActorSystem("Consumer1")
override implicit val materializer = ActorMaterializer()
override implicit val ec = system.dispatcher
val topicsName = "Set of Topic Names"
val parallelism = conf.getInt("consumer.parallelism")


val supervisionDecider: Supervision.Decider = {
 case _ => Supervision.Resume
}

val commiter = committerSettings.getOrElse(CommitterSettings(actorSystem))
val supervisionStrategy = ActorAttributes.supervisionStrategy(supervisionDecider)
Consumer
 .committableSource(consumerSettings, Subscriptions.topics(topicsName))
 .mapAsync(parallelism) {
  msg =>
   f(msg.record.key(), msg.record.value())
    .map(_ => msg.committableOffset)
    .recoverWith {
     case _ => Future.successful(msg.committableOffset)
    }
 }
 .toMat(Committer.sink(commiter).withAttributes(supervisionStrategy))(DrainingControl.apply)
 .withAttributes(supervisionStrategy)

Library versions in code代码中的库版本

"com.typesafe.akka" %% "akka-http"            % "10.1.11",
 "com.typesafe.akka" %% "akka-stream-kafka" % "2.0.3",
 "com.typesafe.akka" %% "akka-stream" % "2.5.30"

The observation are as follows,观察如下，

In successive intervals of 1 hour lets say, only some of consumers假设在 1 小时的连续间隔内，只有一些消费者
are actively consuming the lag and processing at the expected rate.正在以预期的速度积极消耗延迟和处理。
In next 1 hours, some other consumers become active and actively在接下来的 1 小时内，其他一些消费者变得活跃起来
consumes from its partitions and then stop processing.从其分区消耗，然后停止处理。
All the lag gets cleared in a single shot as observed from the offsetLag Graph.正如从 offsetLag Graph 观察到的那样，所有滞后都在一次拍摄中被清除。

We want all the consumers to be running in parallel and processing the messages in real time.我们希望所有消费者并行运行并实时处理消息。 This lag of 3 days in processing is causing a major downtime for us. 3 天的处理延迟导致我们严重停机。 I tried following the given link but we are already on the fixed version https://github.com/akka/alpakka-kafka/issues/549我尝试按照给定的链接，但我们已经在固定版本https://github.com/akka/alpakka-kafka/issues/549

Can anyone help what we are missing in terms of configuration of consumer or some other issue.任何人都可以帮助我们在消费者配置或其他一些问题方面所缺少的东西。

Graph of Offset Lag Per Partition Per Topic每个主题每个分区的偏移滞后图

Answer 1

That lag graph seems to me to indicate that your overall system isn't capable of handling all the load, and it almost looks like only one partition at a time is actually making progress.在我看来，该滞后图表明您的整个系统无法处理所有负载，而且几乎看起来一次只有一个分区实际上在取得进展。

That phenomenon indicates to me that the processing being done in f is ultimately gating on the rate at which some queue can be cleared, and that the parallelism in the mapAsync stage is too high, effectively racing the partitions against each other.这种现象向我表明，在f中进行的处理最终取决于可以清除某些队列的速率，并且mapAsync阶段中的并行度太高，有效地使分区相互竞争。 Since the Kafka consumer batches records (by default in batches of 500, assuming that the consumer's lag is more than 500 records) if tha parallelism is higher than that, all of those records enter the queue at basically the same time as a block.由于Kafka消费者批量记录（默认为500条，假设消费者的滞后超过500条记录）如果并行度高于此值，所有这些记录基本上与块同时进入队列。 It looks like the parallelism in the mapAsync is 1500;看起来mapAsync的并行mapAsync是1500； given the apparent use of the Kafka default 500 batch size, this seems way too high: there's no reason for it to be greater than the Kafka batch size, and if you want a more even consumption rate between partitions, it should be a lot less than that batch size.考虑到 Kafka 默认 500 批量大小的明显使用，这似乎太高了：没有理由让它大于 Kafka 批量大小，如果您想要分区之间更均匀的消耗率，它应该少很多比那个批量大小。

Without details on what happens in f , it's hard to say what that queue is and how much parallelism should be reduced.如果没有关于f发生的事情的详细信息，很难说该队列是什么以及应该减少多少并行度。 But there are some general guidelines I can share:但我可以分享一些一般准则：

If the work is CPU-bound (a sign of this would be very high CPU utilization on your consumers), you have 7 consumers with 4 vCPUs apiece.如果工作受 CPU 限制（这表明您的消费者的 CPU 利用率非常高），则您有 7 个消费者，每个消费者有 4 个 vCPU。 You cannot physically process more than 28 (7 x 4) records at a time, so parallelism in the mapAsync shouldn't exceed 1;您一次不能物理处理超过 28 (7 x 4) 条记录，因此 mapAsync 中的并行度不应超过 1； alternatively you need more and/or bigger instances或者，您需要更多和/或更大的实例
If the work is I/O-bound or otherwise blocking, I would be careful about which threadpool/execution context/Akka dispatcher the work is being done on.如果工作受 I/O 限制或以其他方式阻塞，我会注意工作是在哪个线程池/执行上下文/Akka 调度程序上完成的。 All of those will typically only spawn a bounded number of threads and maintain a work queue when all threads are busy;所有这些通常只会产生有限数量的线程，并在所有线程都忙时维护一个工作队列； that work queue could very well be the queue of interest.该工作队列很可能是感兴趣的队列。 Expanding the number of threads in that pool (or if using the default execution context or default Akka dispatcher, moving that workload to an appropriately sized pool) will decrease the pressure on the queue扩展该池中的线程数（或者如果使用默认执行上下文或默认 Akka 调度程序，将该工作负载移至适当大小的池）将减少队列压力
Since you're including akka-http , it's possible that the processing of messages in f involves sending an HTTP request to some other service.由于您包含akka-http ，因此f的消息处理可能涉及向某些其他服务发送 HTTP 请求。 In that case, it's important to remember that Akka HTTP maintains a queue per targeted host;在这种情况下，重要的是要记住 Akka HTTP 为每个目标主机维护一个队列； it's also likely that there's a queue on the target side which governs throughput there.目标端也可能有一个队列来控制那里的吞吐量。 This is somewhat a special case of the second (I/O bound) situation.这是第二种（I/O 绑定）情况的特殊情况。

The I/O bound/blocking situation will be evidenced by very low CPU utilization on your instances. I/O 绑定/阻塞情况将通过实例上的 CPU 利用率非常低来证明。 If you're filling the queue per targeted host, you'll see log messages about "Exceeded configured max-open-requests value".如果您按目标主机填充队列，您将看到有关“超出配置的最大打开请求值”的日志消息。

Another thing worth noting is that because the Kafka consumer is inherently blocking, the Alpakka Kafka consumer actors run in their own dispatcher, whose size is by default 16, meaning that per host, only at most 16 consumers or producers can be working at a time.另一个值得注意的是，由于Kafka消费者天生就是阻塞的，Alpakka Kafka消费者actor在自己的调度器中运行，其大小默认为16，这意味着每个主机最多只能有16个消费者或生产者同时工作. Setting akka.kafka.default-dispatcher.thread-pool-executor.fixed-pool-size to at least the number of consumers your app starts up (42 in your 6 consumers each per 7 topic configuration) is probably a good idea.将akka.kafka.default-dispatcher.thread-pool-executor.fixed-pool-size设置为至少您的应用程序启动的消费者数量（每 7 个主题配置 6 个消费者中的 42 个）可能是一个好主意。 Thread starvation in the Alpakka Kafka dispatcher can cause consumer rebalances which will disrupt consumption. Alpakka Kafka 调度程序中的线程饥饿会导致消费者重新平衡，这将破坏消费。

Without making any other changes, I would suggest, for a more even consumption rate across partitions, setting在不进行任何其他更改的情况下，我建议，为了跨分区更均匀的消耗率，设置

akka.kafka.default-dispatcher.thread-pool-executor.fixed-pool-size = 42
consumer.parallelism = 50

当我们的 Kafka 分区中存在滞后时，Akka Kafka 消费者处理率会急剧下降

问题描述

1 个解决方案

解决方案1
0 2021-10-28 14:59:16

当我们的 Kafka 分区中存在滞后时，Akka Kafka 消费者处理率会急剧下降

问题描述

1 个解决方案

解决方案1 0 2021-10-28 14:59:16

解决方案1
0 2021-10-28 14:59:16