简体   繁体   English

如何在aws emr中为pyspark执行器设置环境变量?

[英]How to set environment variables for pyspark executor in aws emr?

I have an AWS EMR cluster running pyspark applications (or steps, as its called in aws emr).我有一个运行 pyspark 个应用程序的 AWS EMR 集群(或步骤,如它在 aws emr 中所称)。

I want to set environment variables for the pyspark applications, and put this into the cluster configuration (after some googling):我想为 pyspark 应用程序设置环境变量,并将其放入集群配置中(经过一些谷歌搜索后):

[
  {
    "Classification": "spark-defaults",
    "Properties": {
      "spark.executorEnv.MY_ENV": "some-value"
    }
  }
]

By the environment variable is not available in the pyspark process.通过环境变量在 pyspark 进程中不可用。

I also tried:我也试过:

[
  {
    "Classification": "yarn-env",
    "Properties": {},
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
          "MY_ENV": "some-value",
        }
      }
    ]
  }
]

And then output the environment variables via:然后 output 环境变量通过:

print(os.environ)

MY_ENV does not show up in any case. MY_ENV在任何情况下都不会出现。

How do I pass environment variables to my pyspark application?如何将环境变量传递到我的 pyspark 应用程序?

Can you try to put this in spark-env .你能试着把它放在spark-env中吗?

[
{
   "Classification": "spark-env",
   "Properties": {},
   "Configurations": [
       {
         "Classification": "export",
         "Properties": {
             "MY_ENV": "some-value",
         }
       }
   ]
 }
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM