简体   繁体   English

Python 具有偏移管理的 Kafka 消费者

[英]Python Kafka consumer with offset management

I am a newbie to Kafka and I am trying to set up a consumer in Kafka such that it reads messages published by Kafka Producer.我是 Kafka 的新手,我正在尝试在 Kafka 中设置一个消费者,以便它读取 Kafka Producer 发布的消息。 Correct me if I am wrong, the way I understood if Kafka consumer stores offset in ZooKeeper?如果我错了,请纠正我,如果 Kafka 消费者商店在 ZooKeeper 中偏移,我理解的方式是什么? However, I dont have a zookeeper instance running and want to poll lets say every 5 mins to see if there are any new messages published.但是,我没有运行 zookeeper 实例,并且想每 5 分钟轮询一次,看看是否有任何新消息发布。

So far, the code that I have is:到目前为止,我拥有的代码是:

import logging
from django.conf import settings
import kafka
import sys
import json

bootstrap_servers = ['localhost:8080']
topicName = 'test-info'
consumer = kafka.KafkaConsumer (topicName, group_id = 'test',bootstrap_servers = 
bootstrap_servers,
auto_offset_reset = 'earliest')

count = 0
#print(consumer.topic)
try:
    for message in consumer:
        #print(type(message.value))
        print("\n")
        print("<>"*20)
        print ("%s:%d:%d: key=%s value=%s" % (message.topic, message.partition,message.offset, message.key, message.value))
        print("--"*20)
        info = json.loads(message.value)

        if info['event'] == "new_record" and info['data']['userId'] == "user1" and info['data']['details']['userTeam'] == "foo":
           count = count + 1
           print(count, info['data']['details']['team'], info['data']['details']['leadername'],info['data']['details']['category'])
        else:
            print("Skipping")

    print(count)


except KeyboardInterrupt:
    sys.exit()

How can I save the offset such that next time it polls it reads incremental data?如何保存偏移量,以便下次轮询时读取增量数据? Any pointers will help.任何指针都会有所帮助。

  1. It's true that Kafka consumer stores offset in ZooKeeper. Kafka 消费者存储在 ZooKeeper 中确实存在偏移。 Since you don't have zookeeper installed.由于您没有安装 zookeeper。 Kafka probably uses the its built-in zookeeper. Kafka 可能使用其内置的 zookeeper。

  2. in your case, you don't have do anything more, as you already set the group_id, group_id = 'test' .在您的情况下,您无需再做任何事情,因为您已经设置了 group_id, group_id = 'test' therefore, the consumer will continue consume the data from the last offset automatically for a specific group.因此,消费者将继续自动消费特定组的最后一个偏移量的数据。 because it committed the latest offset in zookeeper automatically (auto_commit is True by default).因为它自动提交了 zookeeper 中的最新偏移量(auto_commit 默认为 True)。 for more info you can check here有关更多信息,您可以在此处查看

  3. if you want to check every 5 mins to see if there are any new messages published, you can add time.sleep(300) in your consumer for loop.如果您想每 5 分钟检查一次以查看是否有任何新消息发布,您可以在您的消费者 for 循环中添加time.sleep(300)

Now Kafka stores offsets in a consumer topic (partition).现在 Kafka 将偏移量存储在消费者主题(分区)中。

Commit offset提交偏移量

We have 2 options:我们有两个选择:

Auto commit offset message自动提交偏移量消息
 enable_auto_commit=True 
Manual commit offset message手动提交偏移量消息
from kafka import TopicPartition, OffsetAndMetadata

# set to False
enable_auto_commit=False

# After consuming the message commit the offset.
consumer.commit({TopicPartition(topic_name, message.partition): OffsetAndMetadata(message.offset + 1, '')})

Follow @DennisLi approach or re-run the consumer after five minutes.遵循@DennisLi 方法或在五分钟后重新运行消费者。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM