简体   繁体   中英

Spark continuous processing mode does not read all kafka topic partition

I'm experimenting with Spark's Continuous Processing mode in Structured Streaming and I'm reading from a Kafka topic with 2 partitions while the Spark application has only one executor with one core.

The application is a simple one where it simply reads from the first topic and publishes on the second one. The problem is my console-consumer that reads from the second topic it sees only messages from one partition of the first topic. This means my Spark application reads only messages from one partition of the topic.

How can I make my Spark application read from both partitions of the topic?

Note

I'm asking this question for people that might run into the same issue as me

I found the answer for my question in the Spark Structured Streaming documentation in the caveats section

Basically, in the continuous processing mode spark launches long running tasks that read from one partition of the topic hence as only one task per core can run, the spark application needs to have as many cores as kafka topic partitions it reads from.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM