简体   繁体   中英

Apache Spark Python to Scala translation

If I got it right Apache YARN receives Application Master and Node Manager as JAR files. They executed as Java process on the nodes of the YARN cluster. When I write a Spark program using Python, Does it compiled into JAR somehow? If not how come Spark is able to execute Python logic on YARN cluster nodes?

The PySpark driver program uses Py4J ( http://py4j.sourceforge.net/ ) to launch a JVM and create a Spark Context. Spark RDD operations written in Python are mapped to operations on PythonRDD.

On the remote workers, PythonRDD launches sub-processes which run Python. The data and code is passed from the Remote Worker's JVM to its Python sub-process using pipes.

Therefore, it is necessary for your YARN nodes to have python installed for this to work.

The python code is not compiled to a JAR, but is distributed around the cluster using Spark. In order to make this possible, user functions written in Python are pickled using the following code https://github.com/apache/spark/blob/master/python/pyspark/cloudpickle.py

Source: https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM