简体繁体中英

Spark continuous processing mode does not read all kafka topic partition

原文 2019-01-10 14:22:39 2 1 apache-spark/ apache-kafka/ spark-structured-streaming/ spark-streaming-kafka

I'm experimenting with Spark's Continuous Processing mode in Structured Streaming and I'm reading from a Kafka topic with 2 partitions while the Spark application has only one executor with one core.

The application is a simple one where it simply reads from the first topic and publishes on the second one. The problem is my console-consumer that reads from the second topic it sees only messages from one partition of the first topic. This means my Spark application reads only messages from one partition of the topic.

How can I make my Spark application read from both partitions of the topic?

Note

I'm asking this question for people that might run into the same issue as me

1 answers

I found the answer for my question in the Spark Structured Streaming documentation in the caveats section

Basically, in the continuous processing mode spark launches long running tasks that read from one partition of the topic hence as only one task per core can run, the spark application needs to have as many cores as kafka topic partitions it reads from.

Kafka topic partition and Spark executor mapping

Read Kafka topic tail in Spark

Spark Structred Streaming Kafka - how to read from a specific partition of topic and do offset managerment

Spark: processing multiple kafka topic in parallel

Read Data from kafka topic into spark dataframe

Read Kafka topic in a Spark batch job

unable to read kafka topic data using spark

spark streaming kafka : Unknown error fetching data for topic-partition

How to load all records from kafka topic using spark in batch mode

Kafka + spark streaming : Multi topic processing in single job

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Kafka topic partition and Spark executor mapping Read Kafka topic tail in Spark Spark Structred Streaming Kafka - how to read from a specific partition of topic and do offset managerment Spark: processing multiple kafka topic in parallel Read Data from kafka topic into spark dataframe Read Kafka topic in a Spark batch job unable to read kafka topic data using spark spark streaming kafka : Unknown error fetching data for topic-partition How to load all records from kafka topic using spark in batch mode Kafka + spark streaming : Multi topic processing in single job

Related Tags

Spark continuous processing mode does not read all kafka topic partition

Question

1 answers

solution1 3 ACCPTED 2019-01-10 14:25:09

solution1
3 ACCPTED 2019-01-10 14:25:09