[英]Increase the Spark workers cores
I have installed Spark on master and 2 workers.我已经在 master 和 2 个 worker 上安装了 Spark。 The original core number per worker is 8. When I start the master, the workers are work properly without any problem, but the problem is in Spark GUI each worker has only 2 cores assigned.
每个工人的原始核心数是 8。当我启动主时,工人正常工作没有任何问题,但问题是在 Spark GUI 中,每个工人只分配了 2 个核心。
Kindly, how can I increase the number of the cores in which each worker works with 8 cores?请问,我怎样才能增加每个工人使用 8 个核心工作的核心数量?
The setting which controls cores per executor is spark.executor.cores
.控制每个执行程序内核的设置是
spark.executor.cores
。 See doc .见文档。 It can be set either via
spark-submit
cmd argument or in spark-defaults.conf
.它可以通过
spark-submit
cmd 参数或在spark-defaults.conf
。 The file is usually located in /etc/spark/conf
(ymmv).该文件通常位于
/etc/spark/conf
(ymmv)。 YOu can search for the conf file with find / -type f -name spark-defaults.conf
您可以使用
find / -type f -name spark-defaults.conf
搜索 conf 文件
spark.executor.cores 8
However the setting does not guarantee that each executor will always get all the available cores.但是,该设置并不能保证每个 executor 将始终获得所有可用的内核。 This depends on your workload.
这取决于您的工作量。
If you schedule tasks on a dataframe or rdd, spark will run a parallel task for each partition of the dataframe.如果您在数据帧或 rdd 上安排任务,spark 将为数据帧的每个分区运行一个并行任务。 A task will be scheduled to an executor (separate jvm) and the executor can run multiple tasks in parallel in jvm threads on each core.
一个任务将被调度到一个执行器(单独的 jvm),执行器可以在每个内核上的 jvm 线程中并行运行多个任务。
Also an exeucutor will not necessarily run on a separate worker.此外,执行者不一定会在单独的工作人员上运行。 If there is enough memory, 2 executors can share a worker node.
如果内存足够,2个executor可以共享一个worker节点。
In order to use all the cores the setup in your case could look as follows:为了使用所有核心,您的案例中的设置可能如下所示:
given you have 10 gig of memory on each node假设您在每个节点上有 10 gig 的内存
spark.default.parallelism 14
spark.executor.instances 2
spark.executor.cores 7
spark.executor.memory 9g
Setting memory to 9g will make sure, each executor is assigned to a separate node.将内存设置为 9g 将确保每个 executor 被分配到一个单独的节点。 Each executor will have 7 cores available.
每个执行器将有 7 个可用内核。 And each dataframe operation will be scheduled to 14 concurrent tasks, which will be distributed x 7 to each executor.
每个数据帧操作将被调度到 14 个并发任务,这些任务将被分配给每个执行器 x 7。 You can also repartition a dataframe, instead of setting
default.parallelism
.您还可以重新分区数据帧,而不是设置
default.parallelism
。 One core and 1gig of memory is left for the operating system.一个核心和 1gig 的内存留给操作系统。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.