Python 具有偏移管理的 Kafka 消费者

Question

I am a newbie to Kafka and I am trying to set up a consumer in Kafka such that it reads messages published by Kafka Producer.我是 Kafka 的新手，我正在尝试在 Kafka 中设置一个消费者，以便它读取 Kafka Producer 发布的消息。 Correct me if I am wrong, the way I understood if Kafka consumer stores offset in ZooKeeper?如果我错了，请纠正我，如果 Kafka 消费者商店在 ZooKeeper 中偏移，我理解的方式是什么？ However, I dont have a zookeeper instance running and want to poll lets say every 5 mins to see if there are any new messages published.但是，我没有运行 zookeeper 实例，并且想每 5 分钟轮询一次，看看是否有任何新消息发布。

So far, the code that I have is:到目前为止，我拥有的代码是：

import logging
from django.conf import settings
import kafka
import sys
import json

bootstrap_servers = ['localhost:8080']
topicName = 'test-info'
consumer = kafka.KafkaConsumer (topicName, group_id = 'test',bootstrap_servers = 
bootstrap_servers,
auto_offset_reset = 'earliest')

count = 0
#print(consumer.topic)
try:
    for message in consumer:
        #print(type(message.value))
        print("\n")
        print("<>"*20)
        print ("%s:%d:%d: key=%s value=%s" % (message.topic, message.partition,message.offset, message.key, message.value))
        print("--"*20)
        info = json.loads(message.value)

        if info['event'] == "new_record" and info['data']['userId'] == "user1" and info['data']['details']['userTeam'] == "foo":
           count = count + 1
           print(count, info['data']['details']['team'], info['data']['details']['leadername'],info['data']['details']['category'])
        else:
            print("Skipping")

    print(count)


except KeyboardInterrupt:
    sys.exit()

How can I save the offset such that next time it polls it reads incremental data?如何保存偏移量，以便下次轮询时读取增量数据？ Any pointers will help.任何指针都会有所帮助。

Answer 1

It's true that Kafka consumer stores offset in ZooKeeper. Kafka 消费者存储在 ZooKeeper 中确实存在偏移。 Since you don't have zookeeper installed.由于您没有安装 zookeeper。 Kafka probably uses the its built-in zookeeper. Kafka 可能使用其内置的 zookeeper。
in your case, you don't have do anything more, as you already set the group_id, group_id = 'test' .在您的情况下，您无需再做任何事情，因为您已经设置了 group_id, group_id = 'test' 。 therefore, the consumer will continue consume the data from the last offset automatically for a specific group.因此，消费者将继续自动消费特定组的最后一个偏移量的数据。 because it committed the latest offset in zookeeper automatically (auto_commit is True by default).因为它自动提交了 zookeeper 中的最新偏移量（auto_commit 默认为 True）。 for more info you can check here有关更多信息，您可以在此处查看
if you want to check every 5 mins to see if there are any new messages published, you can add time.sleep(300) in your consumer for loop.如果您想每 5 分钟检查一次以查看是否有任何新消息发布，您可以在您的消费者 for 循环中添加time.sleep(300) 。

Answer 2

Now Kafka stores offsets in a consumer topic (partition).现在 Kafka 将偏移量存储在消费者主题（分区）中。

Commit offset提交偏移量

We have 2 options:我们有两个选择：

Auto commit offset message自动提交偏移量消息

 enable_auto_commit=True

Manual commit offset message手动提交偏移量消息

from kafka import TopicPartition, OffsetAndMetadata

# set to False
enable_auto_commit=False

# After consuming the message commit the offset.
consumer.commit({TopicPartition(topic_name, message.partition): OffsetAndMetadata(message.offset + 1, '')})

Follow @DennisLi approach or re-run the consumer after five minutes.遵循@DennisLi 方法或在五分钟后重新运行消费者。

Python 具有偏移管理的 Kafka 消费者

问题描述

2 个解决方案

解决方案1
3 已采纳 2020-04-08 03:24:43

解决方案2
0 2022-07-07 17:25:13

Commit offset提交偏移量

Auto commit offset message自动提交偏移量消息

Manual commit offset message手动提交偏移量消息

Python 具有偏移管理的 Kafka 消费者

问题描述

2 个解决方案

解决方案1 3 已采纳 2020-04-08 03:24:43

解决方案2 0 2022-07-07 17:25:13

Commit offset提交偏移量

Auto commit offset message自动提交偏移量消息

Manual commit offset message手动提交偏移量消息

解决方案1
3 已采纳 2020-04-08 03:24:43

解决方案2
0 2022-07-07 17:25:13