[英]Spark Resource Allocation: Number of Cores
Require understanding on how to configure Cores for an Spark Job .需要了解如何为 Spark 作业配置核心。 My Machine can have a max.我的机器可以有最大值。 of 11 Cores, 28 Gb memory
. 11 Cores, 28 Gb memory
。 Below is how I'm allocating resources for my Spark Job and it's execution time is 4.9 mins
下面是我如何为我的 Spark 作业分配资源,它的执行时间是4.9 mins
--driver-memory 2g \
--executor-memory 24g \
--executor-cores 10 \
--num-executors 6
But I ran through multiple articles mentioning number of cores should be ~ 5, when I ran job with this configuration it's execution time increased to 6.9 mins
但是我浏览了多篇文章,提到核心数应该是 ~ 5,当我用这个配置运行作业时,它的执行时间增加到6.9 mins
--driver-memory 2g \
--executor-memory 24g \
--executor-cores 5 \
--num-executors 6 \
It all depends on the behaviour of job, one config does not optimise all needs.这完全取决于作业的行为,一个配置并不能优化所有需求。
--executor-cores
means no of cores on 1 machine. --executor-cores
表示一台机器上没有内核。
It that number is too big (>5) then the machine's disk and.network (which will be shared among all executor spark cores on that machine) will create bottleneck.如果这个数字太大(>5),那么机器的磁盘和网络(将在该机器上的所有执行程序 spark 核心之间共享)将造成瓶颈。 If that no is too less (~1) then it will not achieve good data parallelism and won't benefit from locality of data on same machine.如果 no 太少(~1),那么它将无法实现良好的数据并行性,也不会从同一台机器上的数据局部性中受益。
TLDR
: --executor-coers 5 is fine. TLDR
:--executor-coers 5 很好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.