简体   繁体   English

如何在程序中停止 Python Kafka Consumer?

[英]How to stop Python Kafka Consumer in program?

I am doing Python Kafka consumer (trying to use kafka.consumer.SimpleConsumer or kafka.consumer.simple.SimpleConsumer in http://kafka-python.readthedocs.org/en/latest/apidoc/kafka.consumer.html ).我正在做 Python Kafka 消费者(尝试在http://kafka-python.readthedocs.org/en/latest/apidoc/kafka.consumer.html 中使用 kafka.consumer.SimpleConsumer 或 kafka.consumer.simple.SimpleConsumer )。 When I run the following piece of code, it will run all the time, even if all messages consumed.当我运行以下代码时,它会一直运行,即使所有消息都被消耗了。 I hope the consumer will stop if it consume all the messages.我希望消费者在消费完所有消息后会停止。 How to do it?怎么做? Also I have no idea how to use stop() function (which is in base class kafka.consumer.base.Consumer).我也不知道如何使用 stop() 函数(它在基类 kafka.consumer.base.Consumer 中)。

UPDATE更新

I used signal handler to call consumer.stop().我使用信号处理程序来调用consumer.stop()。 Some error messages were printed out to the screen.一些错误信息被打印到屏幕上。 But the program still was stuck in the for-loop.但是程序仍然卡在 for 循环中。 When new messages came in, the consumer consumed them and printed them.当新消息进来时,消费者消费它们并打印它们。 I also tried client.close().我也试过 client.close()。 But the same result.但同样的结果。

I need some ways to stop the for-loop gracefully.我需要一些方法来优雅地停止 for 循环。

        client = KafkaClient("localhost:9092")
        consumer = SimpleConsumer(client, "test-group", "test")

        consumer.seek(0, 2)# (0,2) and (0,0)

        for message in consumer:
            print "Offset:", message.offset
            print "Value:", message.message.value

Any help is welcome.欢迎任何帮助。 Thanks.谢谢。

We can first check the offset of the last message in the topic. 我们可以首先检查主题中最后一条消息的偏移量。 Then stop the loop when we have reached that offset. 然后,在达到该偏移量时停止循环。

    client = "localhost:9092"
    consumer = KafkaConsumer(client)
    topic = 'test'
    tp = TopicPartition(topic,0)
    #register to the topic
    consumer.assign([tp])

    # obtain the last offset value
    consumer.seek_to_end(tp)
    lastOffset = consumer.position(tp)

    consumer.seek_to_beginning(tp)        

    for message in consumer:
        print "Offset:", message.offset
        print "Value:", message.message.value
        if message.offset == lastOffset - 1:
            break

Use the iter_timeout parameter to set the waiting time. 使用iter_timeout参数设置等待时间。 If set to 10, just like the following piece of code, it will exit if no new message come in in 10 seconds. 如果将其设置为10,就像下面的代码一样,如果在10秒内没有新消息出现,它将退出。 The default value is None, which means that the consumer will block here even if no new messages come in. 默认值为“无”,这意味着即使没有新消息进入,使用者也将在此处阻止。

        self.consumer = SimpleConsumer(self.client, "test-group", "test",
                iter_timeout=10)

Update 更新资料

The above is not a good method. 以上不是一个好的方法。 When lots of messages come in, it is hard to set a small enough iter_timeout to guarantee the stopping. 当有大量消息进入时,很难设置足够小的iter_timeout以保证停止。 So, now, I am using get_message() function, which try to consume one message and stop. 因此,现在,我正在使用get_message()函数,该函数尝试消耗一条消息并停止。 None is returned when no new messages. 没有新消息时不返回任何内容。

Similar solution to Mohit's answer but using the end_offsets function of the consumer. 与Mohit的答案类似的解决方案,但使用了end_offsets函数。

from kafka import KafkaConsumer, TopicPartition

# settings
client = "localhost:9092"
topic = 'test'

# prepare consumer
tp = TopicPartition(topic,0)
consumer = KafkaConsumer(client)
consumer.assign([tp])
consumer.seek_to_beginning(tp)  

# obtain the last offset value
lastOffset = consumer.end_offsets([tp])[tp]

for message in consumer:
    print "Offset:", message.offset
    print "Value:", message.message.value
    if message.offset == lastOffset - 1:
        break

Simpler Solution:更简单的解决方案:

Use poll() instead, with the poll_timeout_ms .改用poll()poll_timeout_ms poll() is non-blocking call. poll()是非阻塞调用。

  • Create a counter variable outside your while loop.在 while 循环之外创建一个计数器变量。
  • Increase the counter every time poll() fetches 0 records from Kafka Brokers.每次 poll() 从 Kafka Brokers 获取 0 条记录时增加计数器。
  • Reset the counter to 0 if the poll() extracts records如果poll()提取记录,则将计数器重置为 0
  • If counter == some threshold(say 10), then break out of loop and close the consumer.如果 counter == 某个阈值(比如 10),则跳出循环并关闭使用者。

In this logic, we rely on the fact that if the poll() didn't fetch any records in 10 subsequent calls, that means we have read all data.在这个逻辑中,我们依赖于这样一个事实:如果poll()在 10 次后续调用中没有获取任何记录,这意味着我们已经读取了所有数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM