[英]Spark reparition() function increases number of tasks per executor, how to increase number of executor
I'm working on IBM Server of 30gb ram (12 cores engine), I have provided all the cores to spark but still, it uses only 1 core, I tried while loading the file and got successful with the command 我正在使用30gb ram(12核引擎)的IBM Server,我提供了所有可激发的核,但是它仍然仅使用1核,我在加载文件时尝试过并成功通过命令
val name_db_rdd = sc.textFile("input_file.csv",12)
and able to provide all the 12 cores to the processing for the starting jobs but I want to split the operation in between the intermediate operations to the executors, so that it can use all the 12 cores. 并能够为启动作业提供所有12个内核,但是我想将中间操作之间的操作拆分给执行者,以便它可以使用所有12个内核。
Image - description 图片描述
val new_rdd = rdd.repartition(12)
As you can see in this image only 1 executor is running and repartition function split the data to many tasks at one executor. 正如您在该图中看到的那样,只有1个执行程序正在运行,并且重新分区功能将数据拆分为一个执行程序执行的许多任务。
It depends how you're launching the job, but you probably want to add --num-executors to your command line when you're launching your spark job. 这取决于您如何启动作业,但是您在启动Spark作业时可能希望在命令行中添加--num-executors。
Something like 就像是
spark-submit
--num-executors 10 \
--driver-memory 2g \
--executor-memory 2g \
--executor-cores 1 \
might work well for you. 可能适合您。
Have a look on the Running Spark on Yarn for more details, though some of the switches they mention are Yarn specific. 尽管它们提到的某些开关是特定于纱线的,但请查看“ 在纱线上运行火花”以获取更多详细信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.