简体繁体中英

Apache Spark Python to Scala translation

原文 2015-08-18 09:05:48 8 1 python/ hadoop/ apache-spark/ yarn/ pyspark

If I got it right Apache YARN receives Application Master and Node Manager as JAR files. They executed as Java process on the nodes of the YARN cluster. When I write a Spark program using Python, Does it compiled into JAR somehow? If not how come Spark is able to execute Python logic on YARN cluster nodes?

1 answers

The PySpark driver program uses Py4J ( http://py4j.sourceforge.net/ ) to launch a JVM and create a Spark Context. Spark RDD operations written in Python are mapped to operations on PythonRDD.

On the remote workers, PythonRDD launches sub-processes which run Python. The data and code is passed from the Remote Worker's JVM to its Python sub-process using pipes.

Therefore, it is necessary for your YARN nodes to have python installed for this to work.

The python code is not compiled to a JAR, but is distributed around the cluster using Spark. In order to make this possible, user functions written in Python are pickled using the following code https://github.com/apache/spark/blob/master/python/pyspark/cloudpickle.py

Source: https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals

Apache SPARK SVM in Scala vs Python

Does Fortify support Python, Scala, and Apache Spark?

Convert Apache Spark Scala code to Python

How to run Multi threaded jobs in apache spark using scala or python?

Apache Spark with Python: error

Apache spark and python lambda

Apache Spark with Python newbie

In theory, Scala is faster than Python for Apache Spark. In practice it is not. What's going on?

Converting a Scala map into Python for Spark

Converting Python to Scala in Spark ML?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Apache SPARK SVM in Scala vs Python Does Fortify support Python, Scala, and Apache Spark? Convert Apache Spark Scala code to Python How to run Multi threaded jobs in apache spark using scala or python? Apache Spark with Python: error Apache spark and python lambda Apache Spark with Python newbie In theory, Scala is faster than Python for Apache Spark. In practice it is not. What's going on? Converting a Scala map into Python for Spark Converting Python to Scala in Spark ML?

Related Tags

Apache Spark Python to Scala translation

Question

1 answers

solution1 2 ACCPTED 2015-08-18 10:04:11

solution1
2 ACCPTED 2015-08-18 10:04:11