简体   繁体   中英

confluent_kafka consumer offset count reset problem

Description

I've been trying to test the correctness of a chunk of data I send to Kafka. When I was trying to use multiprocessing in fabric, I messed up the process as well as the message consumer. The message consumer did not get shut down correctly at first, then it stopped consuming message any more.

After that I re-started Kafka on my local machine (I'm using docker, so I used

docker-compose -f docker-compose-single-broker.yml rm

to remove the kafka I've been using to test, and re-created a new one using

docker-compose -f dokcer-compose-single-broker.yml up

After kafka and kafka-manager was up and running, I found out that although I don't have any messages transferred to kafka, the offset value of the topic I used to test was not reset to 0. 图片 For data in the picture,

"gateway" is the consumer I've been using before and after I re-started kafka.

"gateway_tester" was the topic that I used to send test messages.

"End 54"(value in red) was the number of data consumed from this topic after I re-started kafka.

"Offset 899"(value in blue) was the number of data consumed from this topic before I re-started kafka.

I'm confused why doesn't the offset number get reset after I re-started kafka.

When I was using this consumer after I re-started kafka, it will consume all the data I sent to kafka because the number of data is less than 899...

Then I created a new consumer called "gateway_2" to consume data from the same topic. 图片

As it is shown in the picture, the offset count matched the End value this time. And everything works fine. If I send data to this topic and try to consume data using this new consumer "gateway_2", it consumes the new messages I sent to the topic and it'll ignore the message that it has consumed before. (My setting of the offset is 'auto.offset.reset': 'smallest' )

I'm wondering, if there's a way to reset offset count on the consumer that I used before? Or the only way of solving this problem is to create a new consumer.

Reproduce

1) Start kafka, create a consumer and consume some data to change the offset count on that consumer.

2) Shut down kafka.

3) Re-start kafka and use the same consumer to consumer message.

4) The consumer would consume all data in topic until the amount of data in certain topic reaches the number of offset count.

Configs

  • confluent-kafka-python and librdkafka version : confluent_kafka.version(0.11.4) kafka-python(1.3.5) (I could not find confluent_kafka.libversion() because the project I'm working on used pip to manage python packages and confluent_kafka.libversion doesn't show on the requirements.txt file...)

  • Apache Kafka broker version: 0.9.0.1

  • Client configuration:

    KAFKA_HOST = '0.0.0.0'

    KAFKA_PORT = 9092

    KAFKA_HOST_PORT = '%(host)s:%(port)s' % { 'host': KAFKA_HOST, 'port': KAFKA_PORT, }

    kafka_configuration = { 'bootstrap.servers': KAFKA_HOST_PORT, 'session.timeout.ms': 6000, 'default.topic.config': {'auto.offset.reset': 'smallest'}, }

(I updated group.id with value gateway and gateway_2 (for the new consumer) in my class initializer)

  • Operating system: macOS 10.13.6

'auto.offset.reset': 'smallest' means that if there is no offset info, the offset will be set to the smallest value available.

Once you have consumed message from kafka, there is already offset info and the offset will not be the smallest. When you restart kafka consumer, it will consume message from where you have stopped last time.

May be you could try to set enable.auto.commit to false, which will disable auto offset commit, if it not works, you may need to seek offset to smallest value every time you restart the consumer if you prefer to consume from the earliest message.

I also posted this question as an issue on confluent-kafka-python's github page. My question got solved by a contributor.

Here's the link to the issue: https://github.com/confluentinc/confluent-kafka-python/issues/455

In summary, the contributor @rnpridgeon says that 'Restarting a broker alone is not enough to remove offsets. You will need to delete the backing volume as well as it stores the contents of the __consumer_offsets topic which stores your consumer groups offsets.'

After that I check the docker docs ( https://docs.docker.com/compose/reference/rm/ ) and find out my command docker-compose -f docker-compose-single-broker.yml rm is not enough to remove the anonymous volumes attached to the container.

In stead, I should've use command docker-compose -f docker-compose-single-broker.yml rm -v

Then my problem got solved, the offset value got reset after I re-start kafka using the above command.

抵消该消费者的损失!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM