简体   繁体   中英

Does spark-submit --master --local[4] limit whole app to 4 cores, or just spark workers?

I have a tensorflow program I want to run on the master node of a AWS EMR cluster that only has a very light spark dependency - I want to do a spark submit command to makes as much resources available to tensorflow as possible. I was thinking that if I did

spark-submit --master local[4] myprogram.py

That spark would only get 4 cores, and myprogram.py would get the rest - but maybe I am limited the number of cores to the whole application to only 4? (Say the master node has 32 cores)

The tensorflow program is not distributed - the whole flow is a big spark app that does a lot of ETL with task nodes, then the training just happens on the master node, but the training still uses spark a bit -- that is the awkwardness - ordinarily I would make my own python environment for tensorflow and pyspark, but since I'm on EMR, I don't want to manage two spark installations.

With spark-submit --master local[4] myprogram.py , Run Spark locally with 4 worker threads.

Even if your cluster has 32 cores, spark application will utilize 4 cores only.

The deployment will be non-distributed single-JVM deployment mode, Spark spawns all the execution components - driver, executor, LocalSchedulerBackend, and master - in the same single JVM .

The number of tasks to be launched is controlled by the number of threads as specified in master URL . In your case number of tasks will be 4.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM