[英]How to set Jupyter notebook to Python3 instead of Python2.7 in AWS EMR
I am spinning up an EMR in AWS. 我正在使用AWS制作EMR。 The difficulty arises when using Jupyter to import associated Python modules. 使用Jupyter导入关联的Python模块时会出现困难。 I have a shell script that executes when the EMR starts and imports Python modules. 我有一个外壳脚本,该脚本在EMR启动并导入Python模块时执行。
The notebook is set to run using the PySpark Kernel. 笔记本设置为使用PySpark内核运行。
I believe the problem is that the Jupyter notebook is not pointed to the correct Python in EMR. 我相信问题是Jupyter笔记本没有指向EMR中正确的Python。 The methods I have used to set the notebook to the correct version do not seem to work. 我用来将笔记本设置为正确版本的方法似乎不起作用。
I have set the following configurations. 我已经设置了以下配置。 I have tried changing python to python3.6 and python3. 我尝试将python更改为python3.6和python3。
Configurations=[{
"Classification": "spark-env",
"Properties": {},
"Configurations": [{
"Classification": "export",
"Properties": {
"PYSPARK_PYTHON": "python",
"PYSPARK_DRIVER_PYTHON": "python",
"SPARK_YARN_USER_ENV": "python"
}
}]
I am certain that my shell script is importing the modules because when I run the following on the EMR command line (via SSH) it works: 我确信我的shell脚本正在导入模块,因为当我在EMR命令行(通过SSH)上运行以下命令时,它可以工作:
python3.6
import boto3
However when I run the following, it does not work: 但是,当我运行以下命令时,它不起作用:
python
import boto3
Traceback (most recent call last): File "", line 1, in ImportError: No module named boto3 追溯(最近一次调用):文件“”,ImportError中的第1行:没有名为boto3的模块
When I run the following command in Jupyter I get the output below: 当我在Jupyter中运行以下命令时,我得到以下输出:
import sys
import os
print(sys.version)
2.7.16 (default, Jul 19 2019, 22:59:28) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] 2.7.16(默认值,2019年7月19日,22:59:28)[GCC 4.8.5 20150623(Red Hat 4.8.5-28)]
#!/bin/bash
alias python=python3.6
export PYSPARK_DRIVER_PYTHON="python"
export SPARK_YARN_USER_ENV="python"
sudo python3 -m pip install boto3
sudo python3 -m pip install pandas
sudo python3 -m pip install pymysql
sudo python3 -m pip install xlrd
sudo python3 -m pip install pymssql
When I attempt to import boto3 I get an error message using Jupyter: 当我尝试导入boto3时,我使用Jupyter收到一条错误消息:
No module named boto3 Traceback (most recent call last): ImportError: No module named boto3 没有名为boto3的模块Traceback(最近一次调用是最近的):ImportError:没有名为boto3的模块
If you want to use Python3 with EMR notebooks, the recommended way is to use pyspark kernel and configure Spark to use Python3 within the notebook as, 如果要在EMR笔记本电脑上使用Python3,建议的方法是使用pyspark内核并将Spark配置为在笔记本电脑内使用Python3,
%%configure -f {"conf":{ "spark.pyspark.python": "python3" }}
Note that, 注意,
Any on cluster configuration related to PYSPARK_PYTHON or PYSPARK_PYTHON_DRIVER is overridden by EMR notebook configuration. EMR笔记本配置将覆盖与PYSPARK_PYTHON或PYSPARK_PYTHON_DRIVER相关的所有群集配置。 The only way to configure for Python3 is from within the notebook as mentioned above. 如上所述,为Python3配置的唯一方法是在笔记本中进行。
pyspark3 kernel is deprecated for Livy 4.0+, and henceforth pyspark kernel is recommended to be used for both Python2 and Python3 by configuring spark.pyspark.python accordingly. pyspark3内核不支持Livy 4.0+,因此建议通过相应配置spark.pyspark.python将pyspark内核同时用于Python2和Python3。
If you want to install additional Python dependencies which are not already present on the cluster, you can use notebook-scoped libraries . 如果要安装群集上尚不存在的其他Python依赖项,则可以使用笔记本范围的库 。 It works for both Python2 as well as Python3. 它适用于Python2和Python3。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.