简体   繁体   中英

How to correctly set python version in Spark?

My spark version is 2.4.0, it has python2.7 and python 3.7 . The default version is python2.7. Now I want to submit a pyspark program which uses python3.7. I tried two ways, but both of them don't work.

  1.  spark2-submit --master yarn \\ --conf "spark.pyspark.python=/usr/bin/python3" \\ --conf "spark.pyspark.driver.python=/usr/bin/python3" pi.py

    It doesn't work and says

    Cannot run program "/usr/bin/python3": error=13, Permission denied

    But actually, I have the permission, for example, I can use /usr/bin/python3 test.py to run a python program.

  2.  export PYSPARK_PYTHON=/usr/bin/python3 export PYSPARK_DRIVER_PYTHON=/usr/bin/python3

    In this way, spark can't use python3 at all.

From my experience, I found that including the spark location in the python script tends to be much easier, for this use findspark .

import findspark
spark_location='/opt/spark-2.4.3/' # Set your own
findspark.init(spark_home=spark_location) 

I encountered the same problem.

Solution of configuring the env in the beginning of the script (in Spark not executing tasks ) did not work for me.

Without restarting the cluster, just executing the command below worked for me.

sudo sed -i -e '$a\export PYSPARK_PYTHON=/usr/bin/python3' /etc/spark/conf/spark-env.sh

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM