简体   繁体   中英

How to set environment variables for pyspark executor in aws emr?

I have an AWS EMR cluster running pyspark applications (or steps, as its called in aws emr).

I want to set environment variables for the pyspark applications, and put this into the cluster configuration (after some googling):

[
  {
    "Classification": "spark-defaults",
    "Properties": {
      "spark.executorEnv.MY_ENV": "some-value"
    }
  }
]

By the environment variable is not available in the pyspark process.

I also tried:

[
  {
    "Classification": "yarn-env",
    "Properties": {},
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
          "MY_ENV": "some-value",
        }
      }
    ]
  }
]

And then output the environment variables via:

print(os.environ)

MY_ENV does not show up in any case.

How do I pass environment variables to my pyspark application?

Can you try to put this in spark-env .

[
{
   "Classification": "spark-env",
   "Properties": {},
   "Configurations": [
       {
         "Classification": "export",
         "Properties": {
             "MY_ENV": "some-value",
         }
       }
   ]
 }
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM