简体繁体中英

How to process topic TBs of data in an efficient way in kafka streams

原文 2019-09-14 18:55:45 8 1 apache-kafka/ apache-kafka-streams

I have a simple question related to kafka. I hope I will get some good answers here.

I have an kafka streams application in which I want to tackle a simple scenario where I want to maintain a state store for querying and storing data. Topic has TB's of data which I want to process. I want to create a state store with key, value different than topic key and value. Basically store key will be a part of topic value field and value will be something else. So for this purpose I have to read data from a kafka topic and deserialize a value and get some part of data which will be the key for store.

My questions:

1) What would be best way to possible for this task if topic has TBs of data, as processing of every record in a topic can cost too much.

2) which topology (DSL, Processor API, mix of both) will best suits this scenario and why.

1 answers

@Parkash based on your question here is a rough idea that you can use (Please edit your question or provide some examples to get a more specific answer)

Irrespective of the amount of data in your source topic, if the topic has been partitioned appropriately you should be able to parallelize the reads. Please refer to the streams threading model here https://kafka.apache.org/23/documentation/streams/architecture#streams_architecture_threads
You will need to read all the key-value pairs, I do not see a materialization option in any of the stateless operations (from your question it looks like you are trying to do only stateless operations) so I suppose you will need to use the Processor API to build your state store.

How to process dynamically in kafka streams and send to different topic

Kafka Streams: how to write to a topic?

how to process data in chunks/batches with kafka streams?

Kafka Streams : How to capture event when kafka streams writes data into target topic

Is there a way to repartition the input topic in Kafka streams?

How to handle Not authorized to access topic … in Kafka Streams

Kafka streams: how to produce to a topic while aggregating?

How to implement Kafka Streams topology that process single topic with interactive queries store and global store

Kafka streams interactive queries - how to wait for streams to process all current records from input topic before querying state store

Kafka Streams to topic

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to process dynamically in kafka streams and send to different topic Kafka Streams: how to write to a topic? how to process data in chunks/batches with kafka streams? Kafka Streams : How to capture event when kafka streams writes data into target topic Is there a way to repartition the input topic in Kafka streams? How to handle Not authorized to access topic … in Kafka Streams Kafka streams: how to produce to a topic while aggregating? How to implement Kafka Streams topology that process single topic with interactive queries store and global store Kafka streams interactive queries - how to wait for streams to process all current records from input topic before querying state store Kafka Streams to topic

Related Tags

How to process topic TBs of data in an efficient way in kafka streams

Question

1 answers

solution1 0 2019-09-16 15:32:08

solution1
0 2019-09-16 15:32:08