简体   繁体   English

如何在AWS EMR中将Jupyter Notebook设置为Python3而不是Python2.7

[英]How to set Jupyter notebook to Python3 instead of Python2.7 in AWS EMR

I am spinning up an EMR in AWS. 我正在使用AWS制作EMR。 The difficulty arises when using Jupyter to import associated Python modules. 使用Jupyter导入关联的Python模块时会出现困难。 I have a shell script that executes when the EMR starts and imports Python modules. 我有一个外壳脚本,该脚本在EMR启动并导入Python模块时执行。

The notebook is set to run using the PySpark Kernel. 笔记本设置为使用PySpark内核运行。

I believe the problem is that the Jupyter notebook is not pointed to the correct Python in EMR. 我相信问题是Jupyter笔记本没有指向EMR中正确的Python。 The methods I have used to set the notebook to the correct version do not seem to work. 我用来将笔记本设置为正确版本的方法似乎不起作用。

I have set the following configurations. 我已经设置了以下配置。 I have tried changing python to python3.6 and python3. 我尝试将python更改为python3.6和python3。

Configurations=[{
    "Classification": "spark-env",
    "Properties": {},
    "Configurations": [{
        "Classification": "export",
        "Properties": {
            "PYSPARK_PYTHON": "python",
            "PYSPARK_DRIVER_PYTHON": "python",
            "SPARK_YARN_USER_ENV": "python"
        }
    }]

I am certain that my shell script is importing the modules because when I run the following on the EMR command line (via SSH) it works: 我确信我的shell脚本正在导入模块,因为当我在EMR命令行(通过SSH)上运行以下命令时,它可以工作:

python3.6
import boto3

However when I run the following, it does not work: 但是,当我运行以下命令时,它不起作用:

python
import boto3

Traceback (most recent call last): File "", line 1, in ImportError: No module named boto3 追溯(最近一次调用):文件“”,ImportError中的第1行:没有名为boto3的模块

When I run the following command in Jupyter I get the output below: 当我在Jupyter中运行以下命令时,我得到以下输出:

import sys
import os

print(sys.version)

2.7.16 (default, Jul 19 2019, 22:59:28) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] 2.7.16(默认值,2019年7月19日,22:59:28)[GCC 4.8.5 20150623(Red Hat 4.8.5-28)]

#!/bin/bash
alias python=python3.6
export PYSPARK_DRIVER_PYTHON="python"
export SPARK_YARN_USER_ENV="python"
sudo python3 -m pip install boto3
sudo python3 -m pip install pandas
sudo python3 -m pip install pymysql
sudo python3 -m pip install xlrd
sudo python3 -m pip install pymssql

When I attempt to import boto3 I get an error message using Jupyter: 当我尝试导入boto3时,我使用Jupyter收到一条错误消息:

No module named boto3 Traceback (most recent call last): ImportError: No module named boto3 没有名为boto3的模块Traceback(最近一次调用是最近的):ImportError:没有名为boto3的模块

If you want to use Python3 with EMR notebooks, the recommended way is to use pyspark kernel and configure Spark to use Python3 within the notebook as, 如果要在EMR笔记本电脑上使用Python3,建议的方法是使用pyspark内核并将Spark配置为在笔记本电脑内使用Python3,

%%configure -f {"conf":{ "spark.pyspark.python": "python3" }}

Note that, 注意,

  • Any on cluster configuration related to PYSPARK_PYTHON or PYSPARK_PYTHON_DRIVER is overridden by EMR notebook configuration. EMR笔记本配置将覆盖与PYSPARK_PYTHON或PYSPARK_PYTHON_DRIVER相关的所有群集配置。 The only way to configure for Python3 is from within the notebook as mentioned above. 如上所述,为Python3配置的唯一方法是在笔记本中进行。

  • pyspark3 kernel is deprecated for Livy 4.0+, and henceforth pyspark kernel is recommended to be used for both Python2 and Python3 by configuring spark.pyspark.python accordingly. pyspark3内核不支持Livy 4.0+,因此建议通过相应配置spark.pyspark.python将pyspark内核同时用于Python2和Python3。

  • If you want to install additional Python dependencies which are not already present on the cluster, you can use notebook-scoped libraries . 如果要安装群集上尚不存在的其他Python依赖项,则可以使用笔记本范围的库 It works for both Python2 as well as Python3. 它适用于Python2和Python3。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM