简体繁体 English

为什么我们需要的执行者多于Spark中的机器数量？

[英]Why do we need more executors than number of machines in Spark?

原文 2018-10-05 18:37:27 7 1 scala/ apache-spark/ pyspark

What's the logic behind requesting more executors than machines available in your cluster? 要求执行者多于群集中可用机器的背后的逻辑是什么？

In the ideal situation, we would like to have 1 executor (=1 jvm) at each of our machines, and not few in each machine. 在理想的情况下，我们希望每台计算机上有1个执行器（= 1 jvm），而每台计算机上不少。
If not, then why? 如果没有，那为什么呢？

Thanks in advance 提前致谢

1 个解决方案

In the ideal situation, we would like to have 1 executor (=1 jvm) at each of our machines, and not few in each machine. 在理想的情况下，我们希望每台计算机上有1个执行器（= 1 jvm），而每台计算机上不少。

Not necessarily. 不必要。 Depending on the amount of available memory and JVM implementation separate virtual machines can be much a better option, in particular to: 根据可用内存量和JVM实现，单独的虚拟机可能是更好的选择，尤其是：

Improve memory management with large machines - see for example Why 35GB Heap is Less Than 32GB – Java JVM Memory Oddities . 改善大型计算机的内存管理-例如，为什么35GB的堆内存少于32GB – Java JVM内存奇数。
To improve fault tolerance with unstable workloads - if one JVM fails you'll lose work for all corresponding threads, so keeping things smaller can keep things under control. 为了提高工作负载不稳定的容错能力-如果一个JVM发生故障，您将失去所有相应线程的工作，因此，保持较小的体积可以使事物处于受控状态。
To minimize effort required for GC tuning - very large instances can be extremely painful to tune. 为了最大程度地减少GC调整所需的精力-调整大型实例可能会非常痛苦。

Spark本地模式下的执行程序数 - Number of Executors in Spark Local Mode

AWS EMR 上的 Spark：使用更多执行程序 - Spark on AWS EMR: use more executors

Spark流使用更少的执行程序 - Spark streaming uses lesser number of executors

为什么我们需要在运行Spark SBT应用程序时添加“fork in run：= true”？ - Why do we need to add “fork in run := true” when running Spark SBT application?

作为执行程序和线程数量的函数，spark中的分区数量是多少？ - What is a good number of partitions in spark as a function of number of executors and threads?

给定内核和执行器的数量，如何确定rdd中partd的数量？ - How to determine number of partitons of rdd in spark given the number of cores and executors ?

Spark Standalone集群模式在所有执行程序中都需要jar - Spark Standalone cluster mode need jar in all executors

在Spark Window函数中，为什么我们需要在末尾使用drop（） - In Spark Window functions, Why we need to use drop() at the end

Scala Spark中的groupBy函数是否需要Lzocodec？ - Do we need Lzocodec for groupBy function in Scala Spark?

为什么我们需要 flatMap（一般来说）？ - Why do we need flatMap (in general)?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Spark本地模式下的执行程序数 - Number of Executors in Spark Local Mode AWS EMR 上的 Spark：使用更多执行程序 - Spark on AWS EMR: use more executors Spark流使用更少的执行程序 - Spark streaming uses lesser number of executors 为什么我们需要在运行Spark SBT应用程序时添加“fork in run：= true”？ - Why do we need to add “fork in run := true” when running Spark SBT application? 作为执行程序和线程数量的函数，spark中的分区数量是多少？ - What is a good number of partitions in spark as a function of number of executors and threads? 给定内核和执行器的数量，如何确定rdd中partd的数量？ - How to determine number of partitons of rdd in spark given the number of cores and executors ? Spark Standalone集群模式在所有执行程序中都需要jar - Spark Standalone cluster mode need jar in all executors 在Spark Window函数中，为什么我们需要在末尾使用drop（） - In Spark Window functions, Why we need to use drop() at the end Scala Spark中的groupBy函数是否需要Lzocodec？ - Do we need Lzocodec for groupBy function in Scala Spark? 为什么我们需要 flatMap（一般来说）？ - Why do we need flatMap (in general)?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM