简体   繁体   中英

aws emr can't change default pyspark python on bootstrap

I am using aws with emr, and trying to change to bootstrap script in order to set the default python in pyspark to be python 3, I am following this tutorial

this is changing the /usr/lib/spark/conf/spark-env.sh file, but does not change the python version in pyspark, I am still getting jobs done with python 2.7. this is only working when I ssh to the machine and specifically use

$source /usr/lib/spark/conf/spark-env.ssh

When I try to add this line to the bootstrap script I am getting bootstrap error that the file is not found.

/bin/bash: /usr/lib/spark/conf/spark-env.sh: No such file or directory

I assume that the file does not exist in this stage. How can I set the pyspark python to be python 3 in the bootstrap script?

Add the following code to software configuration (create emr -> step1: software and steps -> edit software configuration -> enter configuration)

[
  {
     "Classification": "spark-env",
     "Configurations": [
       {
         "Classification": "export",
         "Properties": {
            "PYSPARK_PYTHON": "/usr/bin/python3"
          }
       }
    ]
  }
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM