简体   繁体   English

EMR无法运行python 3.x

[英]EMR fails to run python 3.x

I added a step in EMR to run a pyspark job. 我在EMR中添加了一个步骤来运行pyspark作业。 However I end up getting the error 但是我最终得到错误

TypeError: makedirs() got an unexpected keyword argument 'exist_ok'

which makes me suspect that EMR default version python2.7 is being run. 这使我怀疑正在运行EMR默认版本python2.7。 The ami version im using is 5.25.0 which is rather new and according to the documentation should come with 3.6 installed already. 我使用的ami版本是5.25.0,这是相当新的,根据文档,应该已经安装了3.6。

#!/usr/bin/env python3.6

I added the shebang to provide the executable. 我添加了shebang以提供可执行文件。 Is there something else to this? 还有别的吗?

That is right. 没错 The system default version of python is 2.7 but there is also pythoh 3.x with python3 command. python的系统默认版本是2.7,但也有带有python3命令的pythoh3.x。 You can change it by following the below document. 您可以按照以下文档进行更改。

How do I configure Amazon EMR to run a PySpark job using Python 3.4 or 3.6? 如何配置Amazon EMR以使用Python 3.4或3.6运行PySpark作业?

Basically, type the below command on the running cluster 基本上,在正在运行的集群上键入以下命令

sudo sed -i -e '$a\export PYSPARK_PYTHON=/usr/bin/python3' /etc/spark/conf/spark-env.sh

or set the configuration when you create the cluster such as 或在创建群集时设置配置,例如

[
  {
     "Classification": "spark-env",
     "Configurations": [
       {
         "Classification": "export",
         "Properties": {
            "PYSPARK_PYTHON": "/usr/bin/python3"
          }
       }
    ]
  }
]

That is indeed the solution but it is worth mentioning that I am using a data pipeline to run the cluster. 那确实是解决方案,但是值得一提的是,我正在使用数据管道来运行集群。 To provide this config, if you choose to edit in architect than provide a json config, you need to create a configuration object in resources section shown below. 要提供此配置, 如果您选择在Architect中进行编辑而不是提供json配置,则需要在下面的资源部分中创建一个配置对象。 在此处输入图片说明

The configuration you create here will cascade an export configuration. 您在此处创建的配置将级联导出配置。 See below. 见下文。 在此处输入图片说明

The property will be just "PYSPARK_PYTHON": "/usr/bin/python3" for key and value 该属性将仅仅是“ PYSPARK_PYTHON”:“ / usr / bin / python3”作为键和值 在此处输入图片说明

In your emr console, it should then show a configuration object as 在您的emr控制台中,然后应将配置对象显示为

spark-env.export

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM