简体   繁体   English

增加 Spark 工人核心

[英]Increase the Spark workers cores

I have installed Spark on master and 2 workers.我已经在 master 和 2 个 worker 上安装了 Spark。 The original core number per worker is 8. When I start the master, the workers are work properly without any problem, but the problem is in Spark GUI each worker has only 2 cores assigned.每个工人的原始核心数是 8。当我启动主时,工人正常工作没有任何问题,但问题是在 Spark GUI 中,每个工人只分配了 2 个核心。

Kindly, how can I increase the number of the cores in which each worker works with 8 cores?请问,我怎样才能增加每个工人使用 8 个核心工作的核心数量?

The setting which controls cores per executor is spark.executor.cores .控制每个执行程序内核的设置是spark.executor.cores See doc .文档 It can be set either via spark-submit cmd argument or in spark-defaults.conf .它可以通过spark-submit cmd 参数或在spark-defaults.conf The file is usually located in /etc/spark/conf (ymmv).该文件通常位于/etc/spark/conf (ymmv)。 YOu can search for the conf file with find / -type f -name spark-defaults.conf您可以使用find / -type f -name spark-defaults.conf搜索 conf 文件

spark.executor.cores 8

However the setting does not guarantee that each executor will always get all the available cores.但是,该设置并不能保证每个 executor 将始终获得所有可用的内核。 This depends on your workload.这取决于您的工作量。

If you schedule tasks on a dataframe or rdd, spark will run a parallel task for each partition of the dataframe.如果您在数据帧或 rdd 上安排任务,spark 将为数据帧的每个分区运行一个并行任务。 A task will be scheduled to an executor (separate jvm) and the executor can run multiple tasks in parallel in jvm threads on each core.一个任务将被调度到一个执行器(单独的 jvm),执行器可以在每个内核上的 jvm 线程中并行运行多个任务。

Also an exeucutor will not necessarily run on a separate worker.此外,执行者不一定会在单独的工作人员上运行。 If there is enough memory, 2 executors can share a worker node.如果内存足够,2个executor可以共享一个worker节点。

In order to use all the cores the setup in your case could look as follows:为了使用所有核心,您的案例中的设置可能如下所示:

given you have 10 gig of memory on each node假设您在每个节点上有 10 gig 的内存

spark.default.parallelism 14
spark.executor.instances 2
spark.executor.cores 7
spark.executor.memory 9g

Setting memory to 9g will make sure, each executor is assigned to a separate node.将内存设置为 9g 将确保每个 executor 被分配到一个单独的节点。 Each executor will have 7 cores available.每个执行器将有 7 个可用内核。 And each dataframe operation will be scheduled to 14 concurrent tasks, which will be distributed x 7 to each executor.每个数据帧操作将被调度到 14 个并发任务,这些任务将被分配给每个执行器 x 7。 You can also repartition a dataframe, instead of setting default.parallelism .您还可以重新分区数据帧,而不是设置default.parallelism One core and 1gig of memory is left for the operating system.一个核心和 1gig 的内存留给操作系统。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Spark Standalone 集群中的 worker、executor、core 是什么? - What are workers, executors, cores in Spark Standalone cluster? 根据工作人员、核心和数据帧大小确定 Spark 分区的最佳数量 - Determining optimal number of Spark partitions based on workers, cores and DataFrame size dse spark-submit:提交作业时不使用核心(正在运行的应用程序“ Cores = 0”,其中有3个工作人员,每个工作人员有4个核心) - dse spark-submit : core not used when submitting a job (Running application “Cores = 0” with 3 workers available with 4 cores each) spark-submit --master --local [4]是否将整个应用程序限制为4个内核,或者仅激发spark工人? - Does spark-submit --master --local[4] limit whole app to 4 cores, or just spark workers? 如果spark.executor.instances和spark.cores.max不起作用,如何在Spark Standalone模式下增加执行程序的数量 - How to increase the number of executors in Spark Standalone mode if spark.executor.instances and spark.cores.max aren't working 在 Apache Spark 中,当增加工人数量时,对于一些小数据集无法达到更好的加速 - Can not reach better speed up in Apache Spark for some small datasets when increase the number of workers Apache Spark 中的执行器和内核 - Executors and cores in Apache Spark 使用的火花芯数 - Spark number of cores used Spark:理解分区 - 核心 - Spark: understanding partitioning - cores Spark 与内核的并行性 - Spark parallelism with cores
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM