简体   繁体   English

Spark reparition()函数增加了每个执行者的任务数量,如何增加执行者的数量

[英]Spark reparition() function increases number of tasks per executor, how to increase number of executor

I'm working on IBM Server of 30gb ram (12 cores engine), I have provided all the cores to spark but still, it uses only 1 core, I tried while loading the file and got successful with the command 我正在使用30gb ram(12核引擎)的IBM Server,我提供了所有可激发的核,但是它仍然仅使用1核,我在加载文件时尝试过并成功通过命令

val name_db_rdd = sc.textFile("input_file.csv",12)

and able to provide all the 12 cores to the processing for the starting jobs but I want to split the operation in between the intermediate operations to the executors, so that it can use all the 12 cores. 并能够为启动作业提供所有12个内核,但是我想将中间操作之间的操作拆分给执行者,以便它可以使用所有12个内核。

Image - description 图片描述

val new_rdd = rdd.repartition(12)

在此处输入图片说明

As you can see in this image only 1 executor is running and repartition function split the data to many tasks at one executor. 正如您在该图中看到的那样,只有1个执行程序正在运行,并且重新分区功能将数据拆分为一个执行程序执行的许多任务。

It depends how you're launching the job, but you probably want to add --num-executors to your command line when you're launching your spark job. 这取决于您如何启动作业,但是您在启动Spark作业时可能希望在命令行中添加--num-executors。

Something like 就像是

spark-submit
    --num-executors 10 \
    --driver-memory 2g \
    --executor-memory 2g \
    --executor-cores 1 \

might work well for you. 可能适合您。

Have a look on the Running Spark on Yarn for more details, though some of the switches they mention are Yarn specific. 尽管它们提到的某些开关是特定于纱线的,但请查看“ 在纱线运行火花”以获取更多详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM