[英]How to install packages on EMR
I created a cluster on AWS and with Jupyter, python3 installed.我在 AWS 上创建了一个集群,并安装了 Jupyter 和 python3。 Now I can type code in the cells and I found 'numpy' is installed, ie, by
import numpy as np
, I am able to access the functions in this package.现在我可以在单元格中键入代码,我发现 'numpy' 已安装,即通过
import numpy as np
,我能够访问此包中的功能。 However, I found pandas
is not there.但是,我发现
pandas
不在那里。 So in the next cell I typed !pip install pandas
, then it displays所以在下一个单元格中我输入
!pip install pandas
,然后它显示
Requirement already satisfied: pandas in /mnt/usrmoved/local/lib64/python2.7/site-packages
Requirement already satisfied: pytz>=2011k in /mnt/usrmoved/local/lib/python2.7/site-packages (from pandas)
Requirement already satisfied: numpy>=1.7.0 in /mnt/usrmoved/local/lib64/python2.7/site-packages (from pandas)
Requirement already satisfied: python-dateutil in /mnt/usrmoved/local/lib/python2.7/site-packages (from pandas)
Requirement already satisfied: six>=1.5 in /mnt/usrmoved/local/lib/python2.7/site-packages (from python-dateutil->pandas)
I thought it is successfully installed, but then in the next cell, I type import pandas as pd
it gives me an error我以为它已成功安装,但是在下一个单元格中,我键入
import pandas as pd
它给了我一个错误
---------------------------------------------------------------------------
ImportError
Traceback (most recent call last)<ipython-input-8-af55e7023913> in <module>()----> 1 import pandas as pd
ImportError: No module named 'pandas'
In general, how should we install related python packages in EMR?一般来说,我们应该如何在EMR中安装相关的python包?
In my laptop, in the jupyter, I always did "! pip install package" and it works.在我的笔记本电脑中,在 jupyter 中,我总是执行“!pip install package”并且它有效。 But why it does not work in jupyer on EMR?
但是为什么它在 EMR 上的 jupyer 中不起作用?
I tried installing python packages using pip install
, but I get the pip: command not found
.我尝试使用
pip install
安装 python 包,但我得到了pip: command not found
。 So I used pip3
instead of pip, and it worked.所以我用
pip3
而不是 pip,它奏效了。
Using EMR 5.30.1使用 EMR 5.30.1
The conventional method to install python packages on EMR is to specify the packages needed at cluster creation using a bootstrap-action.在 EMR 上安装 python 包的传统方法是使用引导操作指定创建集群时所需的包。
This method ensures the packages are installed on all nodes and not just the driver.此方法可确保包安装在所有节点上,而不仅仅是驱动程序。
aws emr create-cluster \
--name 'test python packages' \
--release-label emr-5.20.0 \
--region us-east-1 \
--use-default-roles
--instance-type m4.large \
--instance-count 2 \
--bootstrap-actions \
Path="s3://your-bucket/python-modules.sh",Name='Install Python Modules' \
The python-modules.sh
would contain commands to install the python packages. python-modules.sh
将包含安装 python 包的命令。 For example:例如:
#!/bin/sh
# Install needed packages
sudo pip install pandas
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.