简体繁体中英

How to optimize number of executor instances in spark structured streaming app?

原文 2019-04-02 12:19:07 4 1 apache-spark/ spark-streaming/ spark-structured-streaming/ spark-streaming-kafka

Runtime

YARN cluster mode

Application

Spark structured streaming
Read data from Kafka topic

About Kafka topic

1 topic with 4 partitions -for now. (number of partitions can be changed)
Added 2000 records maximum in topic per 1 second.

I've found out that the number of Kafka topic partitions is matched with the number of spark executors (1:1).
So, in my case, what I know until now, 4 spark executors is the solution I think.
But I'm worried about data throughput - can be ensured 2000 rec/sec?

Is there any guidance or recommendation about setting proper configuration in spark structured streaming?
Especially spark.executor.cores , spark.executor.instances or something about executor.

1 answers

Setting spark.executor.cores to 5 or less is usually considered the most optimal for HDFS I/O throughput. you can read more about it here (or google other articles): https://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

Each Kafka partition is matched to a spark core, not executor (one spark core can have multiple Kafka partitions but each Kafka partition will have exactly one core).

Deciding what are the exact numbers that you need depends on many other things like your application flow (eg if you are not doing any shuffle the number of total cores should be exactly your Kafka partitions), memory capacity and requirements etc.

You can play with the configurations and use spark metrics to decide if your application is handling the throughput.

spark Optimize performance Structured Streaming Kafka

Spark Structured Streaming Print Offsets Per Batch Per Executor

How to get number of written records in spark structured streaming?

Spark Streaming : number of executor vs Custom Receiver

Spark structured streaming - 2 ReadStreams in one app

Spark Structured Streaming app has no jobs and no stages

dead executors in spark structured streaming app

Spark Structured Streaming OutOfMemoryError caused by thousands of KafkaMbean instances

How to tune the spark executor number?

How to build lambda architecture with Spark Structured Streaming?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question spark Optimize performance Structured Streaming Kafka Spark Structured Streaming Print Offsets Per Batch Per Executor How to get number of written records in spark structured streaming? Spark Streaming : number of executor vs Custom Receiver Spark structured streaming - 2 ReadStreams in one app Spark Structured Streaming app has no jobs and no stages dead executors in spark structured streaming app Spark Structured Streaming OutOfMemoryError caused by thousands of KafkaMbean instances How to tune the spark executor number? How to build lambda architecture with Spark Structured Streaming?

Related Tags

How to optimize number of executor instances in spark structured streaming app?

Question

Runtime

Application

About Kafka topic

1 answers

solution1 1 ACCPTED 2019-04-14 13:53:19

solution1
1 ACCPTED 2019-04-14 13:53:19