简体   繁体   English

为什么我们需要的执行者多于Spark中的机器数量?

[英]Why do we need more executors than number of machines in Spark?

What's the logic behind requesting more executors than machines available in your cluster? 要求执行者多于群集中可用机器的背后的逻辑是什么?

In the ideal situation, we would like to have 1 executor (=1 jvm) at each of our machines, and not few in each machine. 在理想的情况下,我们希望每台计算机上有1个执行器(= 1 jvm),而每台计算机上不少。
If not, then why? 如果没有,那为什么呢?

Thanks in advance 提前致谢

In the ideal situation, we would like to have 1 executor (=1 jvm) at each of our machines, and not few in each machine. 在理想的情况下,我们希望每台计算机上有1个执行器(= 1 jvm),而每台计算机上不少。

Not necessarily. 不必要。 Depending on the amount of available memory and JVM implementation separate virtual machines can be much a better option, in particular to: 根据可用内存量和JVM实现,单独的虚拟机可能是更好的选择,尤其是:

  • Improve memory management with large machines - see for example Why 35GB Heap is Less Than 32GB – Java JVM Memory Oddities . 改善大型计算机的内存管理-例如, 为什么35GB的堆内存少于32GB – Java JVM内存奇数
  • To improve fault tolerance with unstable workloads - if one JVM fails you'll lose work for all corresponding threads, so keeping things smaller can keep things under control. 为了提高工作负载不稳定的容错能力-如果一个JVM发生故障,您将失去所有相应线程的工作,因此,保持较小的体积可以使事物处于受控状态。
  • To minimize effort required for GC tuning - very large instances can be extremely painful to tune. 为了最大程度地减少GC调整所需的精力-调整大型实例可能会非常痛苦。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM