简体   繁体   English

Spark 资源分配:内核数

[英]Spark Resource Allocation: Number of Cores

Require understanding on how to configure Cores for an Spark Job .需要了解如何为 Spark 作业配置核心 My Machine can have a max.我的机器可以有最大值。 of 11 Cores, 28 Gb memory . 11 Cores, 28 Gb memory Below is how I'm allocating resources for my Spark Job and it's execution time is 4.9 mins下面是我如何为我的 Spark 作业分配资源,它的执行时间是4.9 mins

--driver-memory 2g \
--executor-memory 24g \
--executor-cores 10 \
--num-executors 6

But I ran through multiple articles mentioning number of cores should be ~ 5, when I ran job with this configuration it's execution time increased to 6.9 mins但是我浏览了多篇文章,提到核心数应该是 ~ 5,当我用这个配置运行作业时,它的执行时间增加到6.9 mins

--driver-memory 2g \
--executor-memory 24g \
--executor-cores 5 \
--num-executors 6 \
  1. Will there be any issue keeping Number of Cores close to Max.将核心数保持在最大值附近会有任何问题吗? value (10 in my case)?值(在我的例子中是 10)?
  2. Are there any benefits of keeping No. of Cores to 5, as suggested in many articles?如许多文章中所建议的那样,将核心数保持为 5 有什么好处吗?
  3. So in general what are the factors to consider in determining Number of cores?那么一般来说确定Number of cores需要考虑哪些因素呢?

It all depends on the behaviour of job, one config does not optimise all needs.这完全取决于作业的行为,一个配置并不能优化所有需求。

--executor-cores means no of cores on 1 machine. --executor-cores表示一台机器上没有内核。

It that number is too big (>5) then the machine's disk and.network (which will be shared among all executor spark cores on that machine) will create bottleneck.如果这个数字太大(>5),那么机器的磁盘和网络(将在该机器上的所有执行程序 spark 核心之间共享)将造成瓶颈。 If that no is too less (~1) then it will not achieve good data parallelism and won't benefit from locality of data on same machine.如果 no 太少(~1),那么它将无法实现良好的数据并行性,也不会从同一台机器上的数据局部性中受益。

TLDR : --executor-coers 5 is fine. TLDR :--executor-coers 5 很好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM