简体   繁体   中英

Setting an initial “current-offset” and “lag” for new consumer groups for a given topic

I'm working on a product that may add/remove consumer groups depending on how a user uses the product.

enable.auto.commit is turned off in our product and instead we commit the offset every time after we receive the data.

We recently implemented a service that will pause/resume the product. The kafka library (in NodeJS ) did not yet have the pause/resume functions available, so I ended up unsubscribing/subscribing to the topic instead based on the consumers consumer group, which seems to work as we intended.

The only problem occurs when a new consumer group is added. First, let me explain the behavior I'm seeing:

Here is consumer "group1" information..

$ bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group philz-topic-group1

TOPIC                          PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG        CONSUMER-ID                                       HOST                           CLIENT-ID
philz-topic                    1          33              33              0          rdkafka-3ac4d56e-e94b-4365-9af7-04e485502b5d      /10.233.113.109                rdkafka
philz-topic                    4          34              34              0          rdkafka-d642805c-f5ea-4450-9cb0-3272fcbbffc9      /10.233.88.251                 rdkafka
philz-topic                    0          23              23              0          rdkafka-12cfca8b-fd61-4a68-bc5f-1946c8ef4eb1      /10.233.120.55                 rdkafka
philz-topic                    2          26              26              0          rdkafka-7561ca2a-9894-4a3d-83fe-d379bbe64fdf      /10.233.126.40                 rdkafka
philz-topic                    3          20              20              0          rdkafka-cd9d5ed6-7daa-4b75-8f39-6704c8d887ed      /10.233.119.133                rdkafka

And here is consumer "group2" information.. Consumer "group2" was just added and completed one operation. So CURRENT-OFFSET and LAG for a single operation has been updated.

$ bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group philz-topic-group2

TOPIC                          PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG        CONSUMER-ID                                       HOST                           CLIENT-ID
philz-topic                    3          -               20              -          rdkafka-b56306e1-b4b7-43fe-a604-ab7c12f70e9f      /10.233.119.133                rdkafka
philz-topic                    1          -               33              -          rdkafka-76c9a4d2-268b-4ebb-94a8-f1230c9bbfea      /10.233.113.109                rdkafka
philz-topic                    4          34              34              0          rdkafka-d412e574-8241-48c6-af26-c50be44eb51d      /10.233.126.40                 rdkafka
philz-topic                    0          -               23              -          rdkafka-33179a7d-cb9f-453a-83c6-e7e4780372b6      /10.233.88.251                 rdkafka
philz-topic                    2          -               26              -          rdkafka-77506e87-b666-4c92-82df-82071e2ff801      /10.233.120.55                 rdkafka

If a new consumer group was added and no operations were completed, no information about the consumer group is shown with the above command.

The problem I'm facing currently is, when a pause/resume operation occurs and all partitions for a consumer group does not have an updated CURRENT-OFFSET and LAG, when unsubscribing/pausing and completing an operation, a partition should have a LAG of 1 now. But if a new consumer group did not have any previous CURRENT-OFFSET and LAG for the given partition, that information is now skipped and never seen by the consumer group.

My question is, when creating a new consumer group, can we update the CURRENT-OFFSET for the group to match the LOG-END-OFFSET for all available partitions?

I'm not super familiar with Kafka, so any explanation on behavior here is appreciated.

My guess is since we commit offset ourselves (since enable.auto.commit is turned off), when an operation occurs, we are able to see some information for the new consumer group, but only see that one partition (the one that just received the data) is shown and updated with current-offset.

Thanks!

Edit:

Also, in my examples, I have 5 consumers per consumer group, and 5 partitions, so one consumer per partition is expected

Thanks to cricket_007 for providing the kafka consumer option necessary to do this

The consumer option auto.offset.reset allows to automatically set a consumers offset on instantiation. By setting the value of this option to 'earliest', it will set current offset of each partition to LOG-END-OFFSET .

In order to set this option with the node library, simply:

const consumer = new Kafka.KafkaConsumer(config, {
    'auto.offset.reset': 'earliest'
});

Where config are your key/value pair configurations for the consumer, and the second parameter are your key/value pair configurations to create a default topic configuration.

The config is topic-level config set on the consumer as documented here: https://github.com/edenhill/librdkafka/blob/0.11.1.x/CONFIGURATION.md

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM