简体   繁体   中英

Faster way to consume all messages in Kafka-topic

Our team is integrating Kafka with a Flask-application to display data in real-time, but we'd also like to display historical data from Kafka as well.

The idea is, thus, that we consume all messages from specific topics and display the data to our users. However, when we set up our Avro Consumer to poll an entire topic's messages, we can only consume 100k-200k messages a minute, which is way too slow since we have around 2.5million messages per topic. Even when we set upp multiple consumers with the same group-id, we still don't have much of a performance improvement.

Any tips on how to get all messages from a kafka topic in a faster way? Or would it just be better to save the data to a database and then query the data from there?

Our consumer:

c = Consumer({
    'bootstrap.servers': 'brokers:9092',
    'group.id': 'consume_all_topics',
    'auto.offset.reset': 'earliest'
})

c.subscribe(['mytopic'])

now = datetime.now()
msg = c.poll(5.0)
while msg.value()['timestamp'] < now:
    msg = c.poll(5.0)

"Even when we set upp multiple consumers with the same group-id, we still don't have much of a performance improvement.

Any tips on how to get all messages from a kafka topic in a faster way?"

Kafka consumption scales with the number of partitions in a topic. Keep in mind that one partition can be consumed by only one consumer within a consumer group. You will get best consumer performance if the number of partitions match the number of consumers within the Consumer Group.

In addition, your consumption can be increased if you use a compression on your data (such as zstd , which is available in version 2.2.x). Note, that the compression should ideally be handled on the producer side.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM