简体繁体中英

Spark jobs running only on master

原文 2016-04-28 19:32:54 0 1 python/ apache-spark/ mapreduce/ pyspark

I have several python jobs that I need to execute on with spark. The python code doesn't use any spark specific distributed libraries though. It just uses pandas, scipy, and sklearn to manipulate data.

I submit the jobs to spark with the command: spark-submit --master spark://ip:7077 python_code.py

When I submit several of such jobs, all of the jobs execute only on master. The CPU on master goes to 100%, but the workeer nodes are all idle. What I would think is that spark's resource manager would distribute the load across the cluster.

I know that my code doesn't use any of the distributed libraries provided by spark, but is there a way to distribute complete jobs to different nodes?

1 answers

Without spark action APIs(collect/take/first/saveAsTextFile) nothing will be executed on executors. Its not possible to distribute plain python code just by submitting to spark.

You can check other parallel processing libs like dask ( https://github.com/dask/dask ).

Map only jobs in spark (vs hadoop streaming)

How to run PySpark jobs from a local Jupyter notebook to a Spark master in a Docker container?

Compile user subroutine only once when running consecutive jobs in Abaqus

Optimisations on spark jobs

running python jobs in sequence

Running Talend jobs with Python

List running jobs in python

Monitor Spark Jobs in real time

try to understanding spark streaming jobs

Python vs Scala (for Spark jobs)

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Map only jobs in spark (vs hadoop streaming) How to run PySpark jobs from a local Jupyter notebook to a Spark master in a Docker container? Compile user subroutine only once when running consecutive jobs in Abaqus Optimisations on spark jobs running python jobs in sequence Running Talend jobs with Python List running jobs in python Monitor Spark Jobs in real time try to understanding spark streaming jobs Python vs Scala (for Spark jobs)

Related Tags

Spark jobs running only on master

Question

1 answers

solution1 0 2016-05-03 18:36:18

solution1
0 2016-05-03 18:36:18