简体   繁体   中英

Why in kafka python consumer last commited message is consumed again after consumer restart?

I have python kafka consumer, auto_commit set to False , I am committing messages manually. However after restart, consumer is consuming the last message from each partition again. Only the last one, not more.

This is what kafka-consumer-groups shows:

TOPIC    PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG
my-topic 0          0               1               1
my-topic 1          3               4               1

I don't know why it shows lag, and whu current offset is set to the last message instead of next one? When I commit offset 3, shouldn't current offset be moved to 4?

I commit every message I consume, but then on restart, it always consumes the last message again.

EDIT: This is the code I use:

self.subscriber = kafka.KafkaConsumer(self.consumer_topic, 
    client_id=self.consumer_name, group_id=group_id,                                              
    bootstrap_servers=self.consumer_bootstrap_server,                                         
    consumer_timeout_ms=timeout_ms, enable_auto_commit=False)

for record in self.subscriber:
    offset = CommittableOffset(record.topic, record.partition, record.offset)
    # process message
    partition = TopicPartition(record.topic, record.partition)
    offset = OffsetAndMetadata(record.offset, None)

    self.subscriber.commit({partition:offset})
    

It turns out python kafka library works in a little different way than Java/Scala libs I was used to. In Java/Scala lib when I commit a message actually it's message offset + 1 commited. In kafka-python lib I have to add 1 myself to the offset.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM