简体   繁体   中英

Spark reparition() function increases number of tasks per executor, how to increase number of executor

I'm working on IBM Server of 30gb ram (12 cores engine), I have provided all the cores to spark but still, it uses only 1 core, I tried while loading the file and got successful with the command

val name_db_rdd = sc.textFile("input_file.csv",12)

and able to provide all the 12 cores to the processing for the starting jobs but I want to split the operation in between the intermediate operations to the executors, so that it can use all the 12 cores.

Image - description

val new_rdd = rdd.repartition(12)

在此处输入图片说明

As you can see in this image only 1 executor is running and repartition function split the data to many tasks at one executor.

It depends how you're launching the job, but you probably want to add --num-executors to your command line when you're launching your spark job.

Something like

spark-submit
    --num-executors 10 \
    --driver-memory 2g \
    --executor-memory 2g \
    --executor-cores 1 \

might work well for you.

Have a look on the Running Spark on Yarn for more details, though some of the switches they mention are Yarn specific.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM