Kafka python 客户端在分配和重新平衡之间交替而不处理数据

Question

I have a Kafka topic with 40 partitions.我有一个有 40 个分区的 Kafka 主题。 In a Kubernetes cluster.在 Kubernetes 集群中。 I further have a microservice that consumes from this topic.我还有一个使用这个主题的微服务。

Sometimes it happens, within a batch process, that at one point there are some partitions left with unprocessed data while most partitions are finished.有时，在批处理过程中，有时会出现一些分区未处理的数据，而大多数分区已完成。 Using the kafka-consumer-groups.sh this looks like this:使用kafka-consumer-groups.sh这看起来像这样：

TOPIC                          PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG        CONSUMER-ID                                       HOST                           CLIENT-ID
-                              -          -               -               -          kafka-python-2.0.1-f1259971-c8ed-4d98-ba37-40f263b14a78/10.44.2.119                   kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-328f6a97-22ea-4f59-b702-4173feb9f025/10.44.0.29                    kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-9a2ea04e-3bf1-40f4-9262-6c14d0791dfc/10.44.7.35                    kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-81f5be15-535c-436c-996e-f8098d0613a1/10.44.4.26                    kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-ffcf76e2-f0ed-4894-bc70-ee73220881db/10.44.14.2                    kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-fc5709a0-a0b5-4324-92ff-02b6ee0f1232/10.44.2.123                   kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-c058418c-51ec-43e2-b666-21971480665b/10.44.15.2                    kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-0c14afab-af2a-4668-bb3c-015932fbfd13/10.44.14.5                    kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-1cb308f0-203f-43ae-9252-e0fc98eb87b8/10.44.14.4                    kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-42753a7f-80d0-481e-93a6-67445cb1bb5e/10.44.14.6                    kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-63e97395-e1ec-4cab-8edc-c5dd251932af/10.44.2.122                   kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-7116fdc2-809f-4f99-b5bd-60fbf2aba935/10.44.1.37                    kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-f5ef8ff1-f09c-498e-9b27-1bcac94b895b/10.44.2.125                   kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-8feec117-aa3a-42c0-91e8-0ccefac5f134/10.44.2.121                   kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-45cc5605-d3c8-4c77-8ca8-88afbde81a69/10.44.14.3                    kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-9a575ac4-1531-4b2a-b516-12ffa2496615/10.44.5.32                    kafka-python-2.0.1
-                              -          -               -               -          kafka-python-2.0.1-d33e112b-a1f4-4699-8989-daee03a5021c/10.44.14.7                    kafka-python-2.0.1
my-topic                 20         890             890             0          -                                                 -                              -
my-topic                 38         857             857             0          -                                                 -                              -
my-topic                 28         918             918             0          -                                                 -                              -
my-topic                 23         66              909             843        -                                                 -                              -
my-topic                 10         888             888             0          -                                                 -                              -
my-topic                 2          885             885             0          -                                                 -                              -
my-topic                 7          853             853             0          -                                                 -                              -
my-topic                 16         878             878             0          -                                                 -                              -
my-topic                 15         47              901             854        -                                                 -                              -
my-topic                 26         934             934             0          -                                                 -                              -
my-topic                 32         898             898             0          -                                                 -                              -
my-topic                 21         921             921             0          -                                                 -                              -
my-topic                 13         933             933             0          -                                                 -                              -
my-topic                 5          879             879             0          -                                                 -                              -
my-topic                 12         945             945             0          -                                                 -                              -
my-topic                 4          918             918             0          -                                                 -                              -
my-topic                 29         924             924             0          -                                                 -                              -
my-topic                 39         895             895             0          -                                                 -                              -
my-topic                 25         30              926             896        -                                                 -                              -
my-topic                 9          915             915             0          -                                                 -                              -
my-topic                 35         31              890             859        -                                                 -                              -
my-topic                 3          69              897             828        -                                                 -                              -
my-topic                 1          911             911             0          -                                                 -                              -
my-topic                 6          22              901             879        -                                                 -                              -
my-topic                 14         41              881             840        -                                                 -                              -
my-topic                 30         900             900             0          -                                                 -                              -
my-topic                 22         847             847             0          -                                                 -                              -
my-topic                 8          919             919             0          -                                                 -                              -
my-topic                 0          902             902             0          -                                                 -                              -
my-topic                 18         924             924             0          -                                                 -                              -
my-topic                 36         864             864             0          -                                                 -                              -
my-topic                 34         929             929             0          -                                                 -                              -
my-topic                 24         864             864             0          -                                                 -                              -
my-topic                 19         937             937             0          -                                                 -                              -
my-topic                 27         859             859             0          -                                                 -                              -
my-topic                 11         838             838             0          -                                                 -                              -
my-topic                 31         49              922             873        -                                                 -                              -
my-topic                 37         882             882             0          -                                                 -                              -
my-topic                 17         942             942             0          -                                                 -                              -
my-topic                 33         928             928             0          -                                                 -                              -

It further states that the consumer group is rebalancing .它还进一步指出，消费者群体正在rebalancing 。 One thing to note here is that under CONSUMER-ID there are fewer consumers stated as there should be.这里要注意的一件事是，在CONSUMER-ID下，应有的消费者较少。 It should be 20 consumers but in this output, there are only 17 shown even though all pods run.它应该是 20 个消费者，但在此输出中，即使所有 pod 都在运行，也只显示了 17 个。 This number varies and I am not sure if it is an output issue or if they are really not there.这个数字各不相同，我不确定这是输出问题还是它们真的不存在。 This also baffles me because when I initially start (all new Kafka and consumer deployments) this does not happen.这也让我感到困惑，因为当我最初开始（所有新的 Kafka 和消费者部署）时，这不会发生。 So it really seems to be related to consumer deployments being scaled, or otherwise killed.因此，它似乎确实与消费者部署的扩展或以其他方式被杀死有关。

It then happens for a short time that the consumers get assigned and after about half a minute the same picture as above shows again where the consumer group is rebalancing.然后会在短时间内分配消费者，大约半分钟后，与上图相同的图片再次显示消费者组正在重新平衡的位置。

This happens also when I scale down.当我缩小规模时也会发生这种情况。 Eg when I only have 4 consumers.例如，当我只有 4 个消费者时。 I am not sure what's happening here.我不确定这里发生了什么。 The pods all run and I use the same kind of base code and pattern in other microservices where it seems to work fine. Pod 都在运行，我在其他微服务中使用了相同类型的基本代码和模式，在这些微服务中似乎可以正常工作。

I suspect that it has something to do with a consumer pod getting killed because, as I said, with a new deployment it works initially.我怀疑这与消费者 pod 被杀死有关，因为正如我所说，它最初可以在新部署中工作。 This batch is also a bit more long-running than the others I have so a pod kill is more likely during its run.该批次也比我拥有的其他批次运行时间更长，因此在运行期间更有可能杀死 pod。 I am also not sure if it has something to do with most partitions already being finished, this could also just be a quirk of my use case.我也不确定它是否与大多数已经完成的分区有关，这也可能只是我的用例的一个怪癖。

I recognized this because the processing seemed to take forever but new data was still processed.我认识到这一点，因为处理似乎需要永远，但仍在处理新数据。 So I think what happens is that for the brief moment when the consumers are assigned they process data but they never commit the offset before getting rebalanced leaving them in an infinite loop.所以我认为发生的事情是，在分配消费者的短暂时刻，他们处理数据，但在重新平衡之前他们从未提交偏移量，从而使他们处于无限循环中。 The only slightly related thing I found was this issue but it is from quite some versions before and does not fully describe my situation.我发现的唯一稍微相关的事情是这个问题，但它来自之前的一些版本，并没有完全描述我的情况。

I use the kafka-python client and I use the kafka image confluentinc/cp-kafka:5.0.1 .我使用kafka-python 客户端，我使用 kafka 图像confluentinc/cp-kafka:5.0.1 。

I create the topic using the admin client NewTopic(name='my-topic', num_partitions=40, replication_factor=1) and create the client like so:我使用管理客户端NewTopic(name='my-topic', num_partitions=40, replication_factor=1)创建NewTopic(name='my-topic', num_partitions=40, replication_factor=1)并像这样创建客户端：

consumer = KafkaConsumer(consume_topic,
                         bootstrap_servers=bootstrap_servers,
                         group_id=consume_group_id,
                         value_deserializer=lambda m: json.loads(m))

for message in consumer:
    process(message)

What is going wrong here?这里出了什么问题？ Do I have some configuration error?我有一些配置错误吗？

Any help is greatly appreciated.任何帮助是极大的赞赏。

Answer 1

The issue was with the heartbeat configuration.问题出在心跳配置上。 It turns out that while most messages only need seconds to process, few messages take very long to process.事实证明，虽然大多数消息只需要几秒钟来处理，但很少有消息需要很长时间来处理。 In these special cases the heartbeat update took too long for some of the consumers resulting in the broker to assume the consumer is down and start a rebalance.在这些特殊情况下，某些消费者的心跳更新时间太长，导致代理假设消费者已关闭并开始重新平衡。

I assume what happened next is the consumers getting reassigned to the same message, taking too long to process it again and triggering yet another rebalance.我假设接下来发生的事情是消费者被重新分配到同一条消息，再次处理它需要很长时间并触发另一个重新平衡。 Resulting in an endless cycle.导致无限循环。

I finally solved it by increasing both session_timeout_ms and heartbeat_interval_ms in the consumer (documented here ).我最终通过增加消费者中的session_timeout_ms和heartbeat_interval_ms来解决它（记录在这里）。 I also decreased the batch size so that the heartbeat is updated more regularly.我还减少了批量大小，以便更定期地更新心跳。

Kafka python 客户端在分配和重新平衡之间交替而不处理数据

问题描述

1 个解决方案

解决方案1
0 2020-10-22 12:33:16

Kafka python 客户端在分配和重新平衡之间交替而不处理数据

问题描述

1 个解决方案

解决方案1 0 2020-10-22 12:33:16

解决方案1
0 2020-10-22 12:33:16