简体   繁体   中英

Reason on uneven lag on kafka partitions

I have a kafka topic partitioned into 60 partitions. Producer distributes messages in a default round-robin manner. So far so good. But when it comes to consume, a lag is growing. Nothing surprising. The surprise for me is how the lag is sometimes distributed:

GROUP       TOPIC       PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                             HOST           CLIENT-ID
a_group     a_topic     24         2013567         2013573         6               kafka-python-2.0.2-50435c75-9f50-4053-b6d1-5b0c0a59a927 /88.88.49.27   kafka-python-2.0.2
a_group     a_topic     25         2013105         2013118         13              kafka-python-2.0.2-50435c75-9f50-4053-b6d1-5b0c0a59a927 /88.88.49.27   kafka-python-2.0.2
a_group     a_topic     26         2011359         2011365         6               kafka-python-2.0.2-50435c75-9f50-4053-b6d1-5b0c0a59a927 /88.88.49.27   kafka-python-2.0.2
a_group     a_topic     0          2015024         2015025         1               kafka-python-2.0.2-011c0c93-27e7-4c4f-a0c1-909213db6164 /88.88.55.212  kafka-python-2.0.2
a_group     a_topic     1          2011950         2011950         0               kafka-python-2.0.2-011c0c93-27e7-4c4f-a0c1-909213db6164 /88.88.55.212  kafka-python-2.0.2
a_group     a_topic     2          2013069         2013069         0               kafka-python-2.0.2-011c0c93-27e7-4c4f-a0c1-909213db6164 /88.88.55.212  kafka-python-2.0.2
a_group     a_topic     9          2010089         2010464         375             kafka-python-2.0.2-1602a64f-c1d3-43ea-9090-bef8ac9ce411 /88.88.30.213  kafka-python-2.0.2
a_group     a_topic     10         2011620         2011870         250             kafka-python-2.0.2-1602a64f-c1d3-43ea-9090-bef8ac9ce411 /88.88.30.213  kafka-python-2.0.2
a_group     a_topic     11         2011571         2012179         608             kafka-python-2.0.2-1602a64f-c1d3-43ea-9090-bef8ac9ce411 /88.88.30.213  kafka-python-2.0.2
a_group     a_topic     12         2011892         2011893         1               kafka-python-2.0.2-26775cd6-966e-40af-b38e-29cd8a5d2519 /88.88.71.88   kafka-python-2.0.2
a_group     a_topic     13         2012234         2012235         1               kafka-python-2.0.2-26775cd6-966e-40af-b38e-29cd8a5d2519 /88.88.71.88   kafka-python-2.0.2
a_group     a_topic     14         2011195         2011196         1               kafka-python-2.0.2-26775cd6-966e-40af-b38e-29cd8a5d2519 /88.88.71.88   kafka-python-2.0.2
a_group     a_topic     15         2009822         2009826         4               kafka-python-2.0.2-33c2cc43-c328-41c1-b1a5-684a3a970231 /88.88.28.166  kafka-python-2.0.2
a_group     a_topic     16         2013803         2013804         1               kafka-python-2.0.2-33c2cc43-c328-41c1-b1a5-684a3a970231 /88.88.28.166  kafka-python-2.0.2
a_group     a_topic     17         2014154         2014156         2               kafka-python-2.0.2-33c2cc43-c328-41c1-b1a5-684a3a970231 /88.88.28.166  kafka-python-2.0.2
a_group     a_topic     21         2016745         2016746         1               kafka-python-2.0.2-4f570aa2-b328-414f-b954-dfa94ec3a170 /88.88.69.73   kafka-python-2.0.2
a_group     a_topic     22         2013930         2013932         2               kafka-python-2.0.2-4f570aa2-b328-414f-b954-dfa94ec3a170 /88.88.69.73   kafka-python-2.0.2
a_group     a_topic     23         2009764         2009765         1               kafka-python-2.0.2-4f570aa2-b328-414f-b954-dfa94ec3a170 /88.88.69.73   kafka-python-2.0.2
a_group     a_topic     30         2012859         2012859         0               kafka-python-2.0.2-596a8b98-ef88-4a9f-8509-3a6432f8b693 /88.88.55.176  kafka-python-2.0.2
a_group     a_topic     31         2012474         2012474         0               kafka-python-2.0.2-596a8b98-ef88-4a9f-8509-3a6432f8b693 /88.88.55.176  kafka-python-2.0.2
a_group     a_topic     32         2011670         2011672         2               kafka-python-2.0.2-596a8b98-ef88-4a9f-8509-3a6432f8b693 /88.88.55.176  kafka-python-2.0.2
a_group     a_topic     18         2010723         2010723         0               kafka-python-2.0.2-41fdba78-5e87-4cfc-80d2-96c5810aedce /88.88.39.202  kafka-python-2.0.2
a_group     a_topic     19         2012150         2012150         0               kafka-python-2.0.2-41fdba78-5e87-4cfc-80d2-96c5810aedce /88.88.39.202  kafka-python-2.0.2
a_group     a_topic     20         2013224         2013225         1               kafka-python-2.0.2-41fdba78-5e87-4cfc-80d2-96c5810aedce /88.88.39.202  kafka-python-2.0.2
a_group     a_topic     3          2009559         2009559         0               kafka-python-2.0.2-0d9ceb1f-0517-4896-a0c4-56d2255a2805 /88.88.91.145  kafka-python-2.0.2
a_group     a_topic     4          2013069         2013071         2               kafka-python-2.0.2-0d9ceb1f-0517-4896-a0c4-56d2255a2805 /88.88.91.145  kafka-python-2.0.2
a_group     a_topic     5          2010672         2010674         2               kafka-python-2.0.2-0d9ceb1f-0517-4896-a0c4-56d2255a2805 /88.88.91.145  kafka-python-2.0.2
a_group     a_topic     33         2013696         2013696         0               kafka-python-2.0.2-62d88a9e-5f7e-4133-86f5-d6fa372d6b13 /88.88.20.100  kafka-python-2.0.2
a_group     a_topic     34         2009381         2009381         0               kafka-python-2.0.2-62d88a9e-5f7e-4133-86f5-d6fa372d6b13 /88.88.20.100  kafka-python-2.0.2
a_group     a_topic     35         2012901         2012902         1               kafka-python-2.0.2-62d88a9e-5f7e-4133-86f5-d6fa372d6b13 /88.88.20.100  kafka-python-2.0.2
a_group     a_topic     6          2011941         2011942         1               kafka-python-2.0.2-15647249-be71-4945-9b47-ff0b508978bc /88.88.55.113  kafka-python-2.0.2
a_group     a_topic     7          2011215         2011215         0               kafka-python-2.0.2-15647249-be71-4945-9b47-ff0b508978bc /88.88.55.113  kafka-python-2.0.2
a_group     a_topic     8          2011842         2011845         3               kafka-python-2.0.2-15647249-be71-4945-9b47-ff0b508978bc /88.88.55.113  kafka-python-2.0.2
a_group     a_topic     27         2012368         2012370         2               kafka-python-2.0.2-514bd2a8-1332-45f4-b240-3029fa3c471f /88.88.85.27   kafka-python-2.0.2
a_group     a_topic     28         2012921         2012925         4               kafka-python-2.0.2-514bd2a8-1332-45f4-b240-3029fa3c471f /88.88.85.27   kafka-python-2.0.2
a_group     a_topic     29         2008526         2008530         4               kafka-python-2.0.2-514bd2a8-1332-45f4-b240-3029fa3c471f /88.88.85.27   kafka-python-2.0.2
a_group     a_topic     56         2011315         2011315         0               kafka-python-2.0.2-f45f80be-49fe-4d4a-b91e-3f4dbc91afc1 /88.88.84.132  kafka-python-2.0.2
a_group     a_topic     57         2013118         2013118         0               kafka-python-2.0.2-f45f80be-49fe-4d4a-b91e-3f4dbc91afc1 /88.88.84.132  kafka-python-2.0.2
a_group     a_topic     50         2010863         2010865         2               kafka-python-2.0.2-e648adf2-2674-45b2-84f6-71a508d36270 /88.88.72.52   kafka-python-2.0.2
a_group     a_topic     51         2011747         2011749         2               kafka-python-2.0.2-e648adf2-2674-45b2-84f6-71a508d36270 /88.88.72.52   kafka-python-2.0.2
a_group     a_topic     36         2012567         2012568         1               kafka-python-2.0.2-65437c50-b4b8-4f4c-a396-5fc38b15232e /88.88.33.201  kafka-python-2.0.2
a_group     a_topic     37         2012459         2012460         1               kafka-python-2.0.2-65437c50-b4b8-4f4c-a396-5fc38b15232e /88.88.33.201  kafka-python-2.0.2
a_group     a_topic     58         2014003         2014003         0               kafka-python-2.0.2-fada16f0-1ee1-4397-afbe-225a69718181 /88.88.14.72   kafka-python-2.0.2
a_group     a_topic     59         2012843         2012845         2               kafka-python-2.0.2-fada16f0-1ee1-4397-afbe-225a69718181 /88.88.14.72   kafka-python-2.0.2
a_group     a_topic     48         2013246         2013246         0               kafka-python-2.0.2-e56dd468-f905-49ac-84db-493e75013973 /88.88.81.234  kafka-python-2.0.2
a_group     a_topic     49         2011876         2011877         1               kafka-python-2.0.2-e56dd468-f905-49ac-84db-493e75013973 /88.88.81.234  kafka-python-2.0.2
a_group     a_topic     54         2015293         2015294         1               kafka-python-2.0.2-f31dd554-6b24-4623-8475-7df47728b608 /88.88.55.197  kafka-python-2.0.2
a_group     a_topic     55         2013566         2013567         1               kafka-python-2.0.2-f31dd554-6b24-4623-8475-7df47728b608 /88.88.55.197  kafka-python-2.0.2
a_group     a_topic     52         2012371         2012371         0               kafka-python-2.0.2-ea20755a-9d43-4234-a5e8-f25ae62f4d2c /88.88.7.18    kafka-python-2.0.2
a_group     a_topic     53         2009590         2009590         0               kafka-python-2.0.2-ea20755a-9d43-4234-a5e8-f25ae62f4d2c /88.88.7.18    kafka-python-2.0.2
a_group     a_topic     40         2014499         2014500         1               kafka-python-2.0.2-8dfc86d8-0d59-4ab8-a042-9217d8ed185b /88.88.91.216  kafka-python-2.0.2
a_group     a_topic     41         2010624         2010625         1               kafka-python-2.0.2-8dfc86d8-0d59-4ab8-a042-9217d8ed185b /88.88.91.216  kafka-python-2.0.2
a_group     a_topic     46         2013292         2013293         1               kafka-python-2.0.2-c148c72c-43b4-4547-b2c2-cd58810b94f3 /88.88.18.170  kafka-python-2.0.2
a_group     a_topic     47         2009611         2009611         0               kafka-python-2.0.2-c148c72c-43b4-4547-b2c2-cd58810b94f3 /88.88.18.170  kafka-python-2.0.2
a_group     a_topic     38         2012883         2012883         0               kafka-python-2.0.2-6a8e6817-edca-4c6d-b35a-50c908040c7b /88.88.29.146  kafka-python-2.0.2
a_group     a_topic     39         2013983         2013983         0               kafka-python-2.0.2-6a8e6817-edca-4c6d-b35a-50c908040c7b /88.88.29.146  kafka-python-2.0.2
a_group     a_topic     42         2011620         2011620         0               kafka-python-2.0.2-b1816769-1189-4f5f-a5f4-bded61383c8b /88.88.25.36   kafka-python-2.0.2
a_group     a_topic     43         2011800         2011802         2               kafka-python-2.0.2-b1816769-1189-4f5f-a5f4-bded61383c8b /88.88.25.36   kafka-python-2.0.2
a_group     a_topic     44         2012131         2012131         0               kafka-python-2.0.2-b9111d9a-940f-4f3c-9087-6b739abf06b4 /88.88.54.169  kafka-python-2.0.2
a_group     a_topic     45         2010538         2010538         0               kafka-python-2.0.2-b9111d9a-940f-4f3c-9087-6b739abf06b4 /88.88.54.169  kafka-python-2.0.2

The above example shows that almost the entire lag resides on one consumer. This provides to a situation where lag is not consumed to 0 because one consumer cannot handle so many messages.

You can also see (on OFFSET fields) that messages are distributed quite equally.

So what can be the reason of such uneven lag distribution? Any ideas? Any clues how to debug/investigate to find the reason?

After a few days of investigation I've found the reason of the uneven lag in our case. I decided to share my experience so it could help others.

In our case there reason was relate with consumers scaling setup and the default partition_assignment_strategy on the consumer side.

We are using lag-based KEDA scaler for Apache Kafka and had a default range -based partition_assignment_strategy . The problem was that we had set up to gentle lagThreshold . Because of it scaling was going up and down but not going to the max number of instances at any moment. So most of the time partitions where assigned to consumers unevenly. For example, depending on the number of instances, some consumers had 3 partitions assigned while other had 4 partitions assigned. As all partitions had similar amount of messages and consumers were consuming in a pretty similar time, the consumers with larger amount of partitions had more messages to consume. Meanwhile the consumers with smaller number of partitions assigned had enough time to consume their messages. Everything went to the situation that our too gentle lagThreshold and range -based partition_assignment_strategy caused the majority of lag were accumulated on the lower-number partitions.

A bit helpful in this situation was switching consumer partition_assignment_strategy to round robin which causes that consumer are assigned in a bit different way to partitions. Unfortunately this did not helped entirely. It just helped to move accumulation of lag to a different partitions and shuffle it a bit.

The final solution for us was to set lagThreshold more aggressively, so consumer instances where scaled up to max 60 instances. One consumer instance per partition, in our case, cause that consumers are able to consume messages equally.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM