简体繁体中英

Why do we need more executors than number of machines in Spark?

原文 2018-10-05 18:37:27 6 1 scala/ apache-spark/ pyspark

What's the logic behind requesting more executors than machines available in your cluster?

In the ideal situation, we would like to have 1 executor (=1 jvm) at each of our machines, and not few in each machine.
If not, then why?

Thanks in advance

1 answers

In the ideal situation, we would like to have 1 executor (=1 jvm) at each of our machines, and not few in each machine.

Not necessarily. Depending on the amount of available memory and JVM implementation separate virtual machines can be much a better option, in particular to:

Improve memory management with large machines - see for example Why 35GB Heap is Less Than 32GB – Java JVM Memory Oddities .
To improve fault tolerance with unstable workloads - if one JVM fails you'll lose work for all corresponding threads, so keeping things smaller can keep things under control.
To minimize effort required for GC tuning - very large instances can be extremely painful to tune.

Number of Executors in Spark Local Mode

Spark on AWS EMR: use more executors

Spark streaming uses lesser number of executors

Why do we need to add “fork in run := true” when running Spark SBT application?

What is a good number of partitions in spark as a function of number of executors and threads?

How to determine number of partitons of rdd in spark given the number of cores and executors ?

Spark Standalone cluster mode need jar in all executors

In Spark Window functions, Why we need to use drop() at the end

Do we need Lzocodec for groupBy function in Scala Spark?

Why do we need flatMap (in general)?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Number of Executors in Spark Local Mode Spark on AWS EMR: use more executors Spark streaming uses lesser number of executors Why do we need to add “fork in run := true” when running Spark SBT application? What is a good number of partitions in spark as a function of number of executors and threads? How to determine number of partitons of rdd in spark given the number of cores and executors ? Spark Standalone cluster mode need jar in all executors In Spark Window functions, Why we need to use drop() at the end Do we need Lzocodec for groupBy function in Scala Spark? Why do we need flatMap (in general)?

Related Tags

Why do we need more executors than number of machines in Spark?

Question

1 answers

solution1 1 2018-10-05 19:38:05

solution1
1 2018-10-05 19:38:05