Spark reparition() function increases number of tasks per executor, how to increase number of executor

Question

I'm working on IBM Server of 30gb ram (12 cores engine), I have provided all the cores to spark but still, it uses only 1 core, I tried while loading the file and got successful with the command

val name_db_rdd = sc.textFile("input_file.csv",12)

and able to provide all the 12 cores to the processing for the starting jobs but I want to split the operation in between the intermediate operations to the executors, so that it can use all the 12 cores.

Image - description

val new_rdd = rdd.repartition(12)

As you can see in this image only 1 executor is running and repartition function split the data to many tasks at one executor.

Answer 1

It depends how you're launching the job, but you probably want to add --num-executors to your command line when you're launching your spark job.

Something like

spark-submit
    --num-executors 10 \
    --driver-memory 2g \
    --executor-memory 2g \
    --executor-cores 1 \

might work well for you.

Have a look on the Running Spark on Yarn for more details, though some of the switches they mention are Yarn specific.

Spark reparition() function increases number of tasks per executor, how to increase number of executor

Question

1 answers

solution1
1 2015-09-21 08:00:23

Spark reparition() function increases number of tasks per executor, how to increase number of executor

Question

1 answers

solution1 1 2015-09-21 08:00:23

solution1
1 2015-09-21 08:00:23