Spark reparition（）函数增加了每个执行者的任务数量，如何增加执行者的数量

Question

I'm working on IBM Server of 30gb ram (12 cores engine), I have provided all the cores to spark but still, it uses only 1 core, I tried while loading the file and got successful with the command 我正在使用30gb ram（12核引擎）的IBM Server，我提供了所有可激发的核，但是它仍然仅使用1核，我在加载文件时尝试过并成功通过命令

val name_db_rdd = sc.textFile("input_file.csv",12)

and able to provide all the 12 cores to the processing for the starting jobs but I want to split the operation in between the intermediate operations to the executors, so that it can use all the 12 cores. 并能够为启动作业提供所有12个内核，但是我想将中间操作之间的操作拆分给执行者，以便它可以使用所有12个内核。

Image - description 图片描述

val new_rdd = rdd.repartition(12)

As you can see in this image only 1 executor is running and repartition function split the data to many tasks at one executor. 正如您在该图中看到的那样，只有1个执行程序正在运行，并且重新分区功能将数据拆分为一个执行程序执行的许多任务。

Answer 1

It depends how you're launching the job, but you probably want to add --num-executors to your command line when you're launching your spark job. 这取决于您如何启动作业，但是您在启动Spark作业时可能希望在命令行中添加--num-executors。

Something like 就像是

spark-submit
    --num-executors 10 \
    --driver-memory 2g \
    --executor-memory 2g \
    --executor-cores 1 \

might work well for you. 可能适合您。

Have a look on the Running Spark on Yarn for more details, though some of the switches they mention are Yarn specific. 尽管它们提到的某些开关是特定于纱线的，但请查看“ 在纱线上运行火花”以获取更多详细信息。

Spark reparition（）函数增加了每个执行者的任务数量，如何增加执行者的数量

问题描述

1 个解决方案

解决方案1
1 2015-09-21 08:00:23

Spark reparition（）函数增加了每个执行者的任务数量，如何增加执行者的数量

问题描述

1 个解决方案

解决方案1 1 2015-09-21 08:00:23

解决方案1
1 2015-09-21 08:00:23