I am having a kafka producer and consumer in python. I wish to consume messages from kafka producer in batches, let's say 2. From the producer, I have been sending email data like the following:
[{
"email" : "sukhi215c@gmail.com",
"subject": "Test 1",
"message" : "this is a test"
},
{
"email" : "sukhi215c@gmail.com",
"subject": "Test 2",
"message" : "this is a test"
},
{
"email" : "sukhi215c@gmail.com",
"subject": "Test 3",
"message" : "this is a test"
},
{
"email" : "sukhi215c@gmail.com",
"subject": "Test 4",
"message" : "this is a test"
}]
I am trying to consume these data in batches. I wish to consume 2 message at a time and send emails based on those 2 data and consume the next set of data. The workaround that I tried is:
consumer = KafkaConsumer(topic, bootstrap_servers=[server], api_version=(0, 10))
for message in consumer[:2]:
string = message.value.decode("utf-8")
dict_value = ast.literal_eval(string)
The error that I am getting is:
for message in consumer[:2]:
TypeError: 'KafkaConsumer' object is not subscriptable
Can someone help me getting through this?
The consumer is not a collection; it's iterator is infinite.
If you want to perform an action every two events, use a counter or your own list
data = []
consumer = KafkaConsumer(topic, bootstrap_servers=[server], api_version=(0, 10))
for message in consumer:
data.append(message)
if len(data) >= 2:
action(data)
data.clear()
Use the poll()
interface documented here:
https://kafka-python.readthedocs.io/en/master/_modules/kafka/consumer/group.html#KafkaConsumer.poll
This allows you to set a timeout to return early if there are no messages to consume.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.