Does spark-submit --master --local[4] limit whole app to 4 cores, or just spark workers?

Question

I have a tensorflow program I want to run on the master node of a AWS EMR cluster that only has a very light spark dependency - I want to do a spark submit command to makes as much resources available to tensorflow as possible. I was thinking that if I did

spark-submit --master local[4] myprogram.py

That spark would only get 4 cores, and myprogram.py would get the rest - but maybe I am limited the number of cores to the whole application to only 4? (Say the master node has 32 cores)

The tensorflow program is not distributed - the whole flow is a big spark app that does a lot of ETL with task nodes, then the training just happens on the master node, but the training still uses spark a bit -- that is the awkwardness - ordinarily I would make my own python environment for tensorflow and pyspark, but since I'm on EMR, I don't want to manage two spark installations.

Answer 1

With spark-submit --master local[4] myprogram.py , Run Spark locally with 4 worker threads.

Even if your cluster has 32 cores, spark application will utilize 4 cores only.

The deployment will be non-distributed single-JVM deployment mode, Spark spawns all the execution components - driver, executor, LocalSchedulerBackend, and master - in the same single JVM .

The number of tasks to be launched is controlled by the number of threads as specified in master URL . In your case number of tasks will be 4.

Does spark-submit --master --local[4] limit whole app to 4 cores, or just spark workers?

Question

1 answers

solution1
0 2018-09-11 05:40:40

Does spark-submit --master --local[4] limit whole app to 4 cores, or just spark workers?

Question

1 answers

solution1 0 2018-09-11 05:40:40

solution1
0 2018-09-11 05:40:40