My spark version is 2.4.0, it has python2.7 and python 3.7 . The default version is python2.7. Now I want to submit a pyspark program which uses python3.7. I tried two ways, but both of them don't work.
spark2-submit --master yarn \\ --conf "spark.pyspark.python=/usr/bin/python3" \\ --conf "spark.pyspark.driver.python=/usr/bin/python3" pi.py
It doesn't work and says
Cannot run program "/usr/bin/python3": error=13, Permission denied
But actually, I have the permission, for example, I can use /usr/bin/python3 test.py
to run a python program.
export PYSPARK_PYTHON=/usr/bin/python3 export PYSPARK_DRIVER_PYTHON=/usr/bin/python3
In this way, spark can't use python3 at all.
From my experience, I found that including the spark location in the python script tends to be much easier, for this use findspark
.
import findspark
spark_location='/opt/spark-2.4.3/' # Set your own
findspark.init(spark_home=spark_location)
I encountered the same problem.
Solution of configuring the env in the beginning of the script (in Spark not executing tasks ) did not work for me.
Without restarting the cluster, just executing the command below worked for me.
sudo sed -i -e '$a\export PYSPARK_PYTHON=/usr/bin/python3' /etc/spark/conf/spark-env.sh
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.