简体繁体中英

Spark performance tuning - number of executors vs number for cores

原文 2016-08-17 20:49:04 7 2 apache-spark/ spark-streaming

I have two questions around performance tuning in Spark:

I understand one of the key things for controlling parallelism in the spark job is the number of partitions that exist in the RDD that is being processed, and then controlling the executors and cores processing these partitions. Can I assume this to be true:
- # of executors * # of executor cores shoud be <= # of partitions. ie to say one partition is always processed in one core of one executor. There is no point having more executors*cores than the number of partitions
I understand that having a high number of cores per executor can have a -ve impact on things like HDFS writes, but here's my second question, purely from a data processing point of view what is the difference between the two? For eg if I have 10 node cluster what would be the difference between these two jobs (assuming there's ample memory per node to process everything):
1. 5 executors * 2 executor cores
2. 2 executors * 5 executor cores
Assuming there's infinite memory and CPU, from a performance point of view should we expect the above two to perform the same?

2 answers

Most of the time using larger executors (more memory, more cores) are better. One: larger executor with large memory can easily support broadcast joins and do away with shuffle. Second: since tasks are not created equal, statistically larger executors have better chance of surviving OOM issues. The only problem with large executors is GC pauses. G1GC helps.

In my experience, if I had a cluster with 10 nodes, I would go for 20 spark executors. The details of the job matter a lot, so some testing will help determine the optional configuration.

Tuning Spark: number of executors per node when cores available are uneven

Apache Spark: The number of cores vs. the number of executors

Default number of executors and cores for spark-shell

Spark Standalone Number Executors/Cores Control

Spark coalesce relationship with number of executors and cores

How to get number of executors and number of cores in Java spark

How to determine number of partitons of rdd in spark given the number of cores and executors ?

Spark: Inconsistent performance number in scaling number of cores

maxOffsetsPerTrigger vs number of cores in the spark cluster

Apache Spark number of executors

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Tuning Spark: number of executors per node when cores available are uneven Apache Spark: The number of cores vs. the number of executors Default number of executors and cores for spark-shell Spark Standalone Number Executors/Cores Control Spark coalesce relationship with number of executors and cores How to get number of executors and number of cores in Java spark How to determine number of partitons of rdd in spark given the number of cores and executors ? Spark: Inconsistent performance number in scaling number of cores maxOffsetsPerTrigger vs number of cores in the spark cluster Apache Spark number of executors

Related Tags

Spark performance tuning - number of executors vs number for cores

Question

2 answers

solution1
1 2018-05-25 05:43:30

solution2
0 2016-08-19 01:48:14

Spark performance tuning - number of executors vs number for cores

Question

2 answers

solution1 1 2018-05-25 05:43:30

solution2 0 2016-08-19 01:48:14

solution1
1 2018-05-25 05:43:30

solution2
0 2016-08-19 01:48:14