[英]kafka-python consumer not receiving messages
I am having trouble with KafaConsumer
to make it read from the beginning, or from any other explicit offset.我在使用
KafaConsumer
使其从头或任何其他显式偏移读取时遇到问题。
Running the command line tools for the consumer for the same topic, I do see messages with the --from-beginning
option and it hangs otherwise针对同一主题为消费者运行命令行工具,我确实看到了带有
--from-beginning
选项的消息,否则它会挂起
$ ./kafka-console-consumer.sh --zookeeper {localhost:port} --topic {topic_name} --from-beginning
If I run it through python, it hangs, which I suspect to be caused by incorrect consumer configs如果我通过 python 运行它,它会挂起,我怀疑这是由不正确的消费者配置引起的
consumer = KafkaConsumer(topic_name,
bootstrap_servers=['localhost:9092'],
group_id=None,
auto_commit_enable=False,
auto_offset_reset='smallest')
print "Consuming messages from the given topic"
for message in consumer:
print "Message", message
if message is not None:
print message.offset, message.value
print "Quit"
Consuming messages from the given topic (hangs after that)使用给定主题的消息(之后挂起)
I am using kafka-python 0.9.5 and the broker runs kafka 8.2.我正在使用 kafka-python 0.9.5,代理运行 kafka 8.2。 Not sure what the exact problem is.
不确定确切的问题是什么。
Set _group_id=None_ as suggested by dpkp to emulate the behavior of console consumer.按照 dpkp 的建议设置 _group_id=None_ 以模拟控制台消费者的行为。
The difference between the console-consumer and the python consumer code you have posted is the python consumer uses a consumer group to save offsets: group_id="test-consumer-group"
.控制台消费者和您发布的 python 消费者代码之间的区别在于 python 消费者使用消费者组来保存偏移量:
group_id="test-consumer-group"
。 If instead you set group_id=None, you should see the same behavior as the console consumer.相反,如果您设置 group_id=None,您应该看到与控制台使用者相同的行为。
auto_offset_reset='earliest' 为我解决了这个问题。
auto_offset_reset='earliest'
和group_id=None
为我解决了这个问题。
I ran into the same problem: I can recieve in kafka console but can't get message with python script using package kafka-python
.我遇到了同样的问题:我可以在 kafka 控制台中接收,但无法使用包
kafka-python
使用 python 脚本获取消息。
Finally I figure the reason is that I didn't call producer.flush()
and producer.close()
in my producer.py
which is not mentioned in its documentation .最后我认为原因是我没有在我的
producer.py
调用producer.flush()
和producer.close()
,这在其文档中没有提到。
My take is: to print and ensure offset is what you expect it to be.我的看法是:打印并确保偏移量符合您的预期。 By using
position()
and seek_to_beginning()
, please see comments in the code.通过使用
position()
和seek_to_beginning()
,请查看代码中的注释。
I can't explain:我无法解释:
KafkaConsumer
, the partitions are not assigned, is this by design?KafkaConsumer
后没有分配分区,这是设计KafkaConsumer
吗? Hack around is to call poll()
once before seek_to_beginning()
seek_to_beginning()
之前调用poll()
seek_to_beginning()
seek_to_beginning()
, first call to poll()
returns no data and doesnt change the offset.seek_to_beginning()
,首先调用poll()
不返回数据并且不更改偏移量。 Code:代码:
import kafka
print(kafka.__version__)
from kafka import KafkaProducer, KafkaConsumer
from time import sleep
KAFKA_URL = 'localhost:9092' # kafka broker
KAFKA_TOPIC = 'sida3_sdtest_topic' # topic name
# ASSUMING THAT the topic exist
# write to the topic
producer = KafkaProducer(bootstrap_servers=[KAFKA_URL])
for i in range(20):
producer.send(KAFKA_TOPIC, ('msg' + str(i)).encode() )
producer.flush()
# read from the topic
# auto_offset_reset='earliest', # auto_offset_reset is needed when offset is not found, it's NOT what we need here
consumer = KafkaConsumer(KAFKA_TOPIC,
bootstrap_servers=[KAFKA_URL],
max_poll_records=2,
group_id='sida3'
)
# (!?) wtf, why we need this to get partitions assigned
# AssertionError: No partitions are currently assigned if poll() is not called
consumer.poll()
consumer.seek_to_beginning()
# also AssertionError: No partitions are currently assigned if poll() is not called
print('partitions of the topic: ',consumer.partitions_for_topic(KAFKA_TOPIC))
from kafka import TopicPartition
print('before poll() x2: ')
print(consumer.position(TopicPartition(KAFKA_TOPIC, 0)))
print(consumer.position(TopicPartition(KAFKA_TOPIC, 1)))
# (!?) sometimes the first call to poll() returns nothing and doesnt change the offset
messages = consumer.poll()
sleep(1)
messages = consumer.poll()
print('after poll() x2: ')
print(consumer.position(TopicPartition(KAFKA_TOPIC, 0)))
print(consumer.position(TopicPartition(KAFKA_TOPIC, 1)))
print('messages: ', messages)
Output :输出:
2.0.1
partitions of the topic: {0, 1}
before poll() x2:
0
0
after poll() x2:
0
2
messages: {TopicPartition(topic='sida3_sdtest_topic', partition=1): [ConsumerRecord(topic='sida3_sdtest_topic', partition=1, offset=0, timestamp=1600335075864, timestamp_type=0, key=None, value=b'msg0', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=4, serialized_header_size=-1), ConsumerRecord(topic='sida3_sdtest_topic', partition=1, offset=1, timestamp=1600335075864, timestamp_type=0, key=None, value=b'msg1', headers=[], checksum=None, serialized_key_size=-1, serialized_value_size=4, serialized_header_size=-1)]}
I faced the same issue before, so I ran kafka-topics locally at the machine running the code to test and I got UnknownHostException.我之前遇到过同样的问题,所以我在运行代码的机器上本地运行 kafka-topics 进行测试,我得到了 UnknownHostException。 I added the IP and the host name in
hosts
file and it worked fine in both kafka-topics and the code.我在
hosts
文件中添加了 IP 和主机名,它在 kafka-topics 和代码中都运行良好。 It seems that KafkaConsumer
was trying to fetch the messages but failed without raising any exceptions.似乎
KafkaConsumer
试图获取消息但没有引发任何异常就失败了。
For me, I had to specify the router's IP in the kafka PLAINTEXT configuration.对我来说,我必须在 kafka PLAINTEXT 配置中指定路由器的 IP。
Get the router's IP with:使用以下命令获取路由器的 IP:
echo $(ifconfig | grep -E "([0-9]{1,3}\.){3}[0-9]{1,3}" | grep -v 127.0.0.1 | awk '{ print $2 }' | cut -f2 -d: | head -n1)
and then add PLAINTEXT_HOST://<touter_ip>:9092
to the kafka advertised listeners.然后将
PLAINTEXT_HOST://<touter_ip>:9092
添加到 kafka 通告的侦听器中。 In case of a confluent docker service the configuration is as follows:如果是 docker 服务,配置如下:
kafka:
image: confluentinc/cp-kafka:7.0.1
container_name: kafka
depends_on:
- zookeeper
ports:
- 9092:9092
- 29092:29092
environment:
- KAFKA_BROKER_ID=1
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:29092,PLAINTEXT_HOST://172.28.0.1:9092
- KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
- KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT
- KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1
and finally the python consumer is:最后 python 消费者是:
from kafka import KafkaConsumer
from json import loads
consumer = KafkaConsumer(
'my-topic',
bootstrap_servers=['172.28.0.1:9092'],
auto_offset_reset = 'earliest',
group_id=None,
)
print('Listening')
for msg in consumer:
print(msg)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.