简体   繁体   English

如何获取 kafka 主题分区的最新偏移量?

[英]How to get latest offset for a partition for a kafka topic?

I am using the Python high level consumer for Kafka and want to know the latest offsets for each partition of a topic.我正在为 Kafka 使用 Python 高级消费者,并且想知道主题的每个分区的最新偏移量。 However I cannot get it to work.但是我无法让它工作。

from kafka import TopicPartition
from kafka.consumer import KafkaConsumer

con = KafkaConsumer(bootstrap_servers = brokers)
ps = [TopicPartition(topic, p) for p in con.partitions_for_topic(topic)]

con.assign(ps)
for p in ps:
    print "For partition %s highwater is %s"%(p.partition,con.highwater(p))

print "Subscription = %s"%con.subscription()
print "con.seek_to_beginning() = %s"%con.seek_to_beginning()

But the output I get is但是我得到的 output 是

For partition 0 highwater is None
For partition 1 highwater is None
For partition 2 highwater is None
For partition 3 highwater is None
For partition 4 highwater is None
For partition 5 highwater is None
....
For partition 96 highwater is None
For partition 97 highwater is None
For partition 98 highwater is None
For partition 99 highwater is None
Subscription = None
con.seek_to_beginning() = None
con.seek_to_end() = None

I have an alternate approach using assign but the result is the same我有另一种使用assign的方法,但结果是一样的

con = KafkaConsumer(bootstrap_servers = brokers)
ps = [TopicPartition(topic, p) for p in con.partitions_for_topic(topic)]

con.assign(ps)
for p in ps:
    print "For partition %s highwater is %s"%(p.partition,con.highwater(p))

print "Subscription = %s"%con.subscription()
print "con.seek_to_beginning() = %s"%con.seek_to_beginning()
print "con.seek_to_end() = %s"%con.seek_to_end()

It seems from some of the documentation that I might get this behaviour if a fetch has not been issued.从某些文档看来,如果未发出fetch ,我可能会遇到此行为。 But I cannot find a way to force that.但是我找不到强制执行的方法。 What am I doing wrong?我究竟做错了什么?

Or is there a different/simpler way to get the latest offsets for a topic?或者是否有不同/更简单的方法来获取主题的最新偏移量?

Finally after spending a day on this and several false starts, I was able to find a solution and get it working.最后,在花了一天的时间和几次错误的启动之后,我找到了解决方案并使其正常工作。 Posting it her so that others may refer to it.把它贴给她,以便其他人可以参考。

from kafka import SimpleClient
from kafka.protocol.offset import OffsetRequest, OffsetResetStrategy
from kafka.common import OffsetRequestPayload

client = SimpleClient(brokers)

partitions = client.topic_partitions[topic]
offset_requests = [OffsetRequestPayload(topic, p, -1, 1) for p in partitions.keys()]

offsets_responses = client.send_offset_request(offset_requests)

for r in offsets_responses:
    print "partition = %s, offset = %s"%(r.partition, r.offsets[0])

If you wish to use Kafka shell scripts present in kafka/bin, then you can get latest and smallest offsets by using kafka-run-class.sh.如果您希望使用存在于 kafka/bin 中的 Kafka shell 脚本,那么您可以使用 kafka-run-class.sh 获取最新和最小的偏移量。

To get latest offset command will look like this要获取最新的偏移量命令将如下所示

bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --time -1 --topic topiname

To get smallest offset command will look like this要获得最小偏移量命令将如下所示

bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --time -2 --topic topiname

You can find more information on Get Offsets Shell from following link您可以从以下 链接中找到有关 Get Offsets Shell 的更多信息

Hope this helps!希望这可以帮助!

from kafka import KafkaConsumer, TopicPartition

TOPIC = 'MYTOPIC'
GROUP = 'MYGROUP'
BOOTSTRAP_SERVERS = ['kafka01:9092', 'kafka02:9092']

consumer = KafkaConsumer(
        bootstrap_servers=BOOTSTRAP_SERVERS,
        group_id=GROUP,
        enable_auto_commit=False
    )


for p in consumer.partitions_for_topic(TOPIC):
    tp = TopicPartition(TOPIC, p)
    consumer.assign([tp])
    committed = consumer.committed(tp)
    consumer.seek_to_end(tp)
    last_offset = consumer.position(tp)
    print("topic: %s partition: %s committed: %s last: %s lag: %s" % (TOPIC, p, committed, last_offset, (last_offset - committed)))

consumer.close(autocommit=False)

With kafka-python>=1.3.4 you can use:随着kafka-python>=1.3.4你可以使用:

kafka.KafkaConsumer.end_offsets(partitions) kafka.KafkaConsumer.end_offsets(分区)

Get the last offset for the given partitions.获取给定分区的最后一个偏移量。 The last offset of a partition is the offset of the upcoming message, ie the offset of the last available message + 1.一个分区的最后一个偏移量就是即将到来的消息的偏移量,即最后一条可用消息的偏移量+1。

from kafka import TopicPartition
from kafka.consumer import KafkaConsumer

con = KafkaConsumer(bootstrap_servers = brokers)
ps = [TopicPartition(topic, p) for p in con.partitions_for_topic(topic)]

con.end_offsets(ps)

Another way to achieve this is by polling the consumer to obtain the last consumed offset and then using the seek_to_end method to obtain the most recent available offset partition.实现此目的的另一种方法是轮询消费者以获取上次消费的偏移量,然后使用 seek_to_end 方法获取最近可用的偏移量分区。

from kafka import KafkaConsumer
consumer = KafkaConsumer('my-topic',
                     group_id='my-group',
                     bootstrap_servers=['localhost:9092'])
consumer.poll()
consumer.seek_to_end()

This method particularly comes in handy when using consumer groups.这种方法在使用消费者组时特别有用。

SOURCES:来源:

  1. https://kafka-python.readthedocs.io/en/master/apidoc/kafka.consumer.html#kafka.consumer.KafkaConsumer.poll https://kafka-python.readthedocs.io/en/master/apidoc/kafka.consumer.html#kafka.consumer.KafkaConsumer.poll
  2. https://kafka-python.readthedocs.io/en/master/apidoc/kafka.consumer.html#kafka.consumer.KafkaConsumer.seek_to_end https://kafka-python.readthedocs.io/en/master/apidoc/kafka.consumer.html#kafka.consumer.KafkaConsumer.seek_to_end

Using confluent-kafka-python使用confluent-kafka-python

You can use position :您可以使用position

Retrieve current positions (offsets) for the list of partitions.检索分区列表的当前位置(偏移量)。

from confluent_kafka import Consumer, TopicPartition


consumer = Consumer({"bootstrap.servers": "localhost:9092"})
topic = consumer.list_topics(topic='topicName')
partitions = [TopicPartition('topicName', partition) for partition in list(topic.topics['topicName'].partitions.keys())] 

offset_per_partition = consumer.position(partitions)

Alternatively, you can also use get_watermark_offsets but you'd have to pass one partition at a time and thus it requires multiple calls:或者,您也可以使用get_watermark_offsets但您必须一次传递一个分区,因此它需要多次调用:

Retrieve low and high offsets for partition.检索分区的低偏移和高偏移。

from confluent_kafka import Consumer, TopicPartition


consumer = Consumer({"bootstrap.servers": "localhost:9092"})
topic = consumer.list_topics(topic='topicName')
partitions = [TopicPartition('topicName', partition) for partition in list(topic.topics['topicName'].partitions.keys())] 

for p in partitions:
    low_offset, high_offset = consumer.get_watermark_offsets(p)
    print(f"Latest offset for partition {p}: {high_offset}")

Using kafka-python使用kafka-python

You can use end_offsets :您可以使用end_offsets

Get the last offset for the given partitions.获取给定分区的最后一个偏移量。 The last offset of a partition is the offset of the upcoming message, ie the offset of the last available message + 1.一个分区的最后一个偏移量就是即将到来的消息的偏移量,即最后一条可用消息的偏移量+1。

This method does not change the current consumer position of the partitions.此方法不会更改分区的当前使用者位置。

from kafka import TopicPartition
from kafka.consumer import KafkaConsumer


consumer = KafkaConsumer(bootstrap_servers = "localhost:9092" )
partitions= = [TopicPartition('myTopic', p) for p in consumer.partitions_for_topic('myTopic')]
last_offset_per_partition = consumer.end_offsets(partitions)

kafka-consumer-groups --bootstrap-server host1:9093,crow-host2:9093,host3:9093 --command-config=/root/client.properties --describe --group atlas

This command will show the status.此命令将显示状态。 Lag/offset滞后/偏移

Using kafka-python使用卡夫卡蟒蛇

While defining the consumer, argument 'auto_offset_reset' can be set either to 'earliest' or 'latest'.在定义消费者时,参数“auto_offset_reset”可以设置为“earliest”或“latest”。 This is useful incase consumer starts after the retention period and/or restarts after breaking down, messages will be consumed as per auto.offset.reset configuration这在消费者在保留期后启动和/或在崩溃后重新启动时很有用,消息将按照 auto.offset.reset 配置使用

from kafka import KafkaConsumer
consumer = KafkaConsumer(
    'my-topic',
     bootstrap_servers=['localhost:9092'],
     auto_offset_reset='latest',
     enable_auto_commit=True,
     group_id='my-group',
     value_deserializer=lambda x: loads(x.decode('utf-8')))

see example https://towardsdatascience.com/kafka-python-explained-in-10-lines-of-code-800e3e07dad1参见示例https://towardsdatascience.com/kafka-python-explained-in-10-lines-of-code-800e3e07dad1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM