[英]Amazon EMR pip install in bootstrap actions runs OK but has no effect
In Amazon EMR, I am using the following script as a custom bootstrap action to install python packages.在 Amazon EMR 中,我使用以下脚本作为自定义引导操作来安装 python 个程序包。 The script runs OK (checked the logs, packages installed successfully) but when I open a notebook in Jupyter Lab, I cannot import any of them.
脚本运行正常(检查日志,包安装成功)但是当我在 Jupyter Lab 中打开笔记本时,我无法导入其中任何一个。 If I open a terminal in JupyterLab and run
pip list
or pip3 list
, none of my packages is there.如果我在 JupyterLab 中打开一个终端并运行
pip list
或pip3 list
,我的包都不在那里。 Even if I go to /
and run find. -name mleap
即使我 go 到
/
并运行find. -name mleap
find. -name mleap
for instance, it does not exist. find. -name mleap
例如,它不存在。
Something I have noticed is that on the master node, I am getting all the time an error saying bootstrap action 2 has failed (there is no second action, only one).我注意到,在主节点上,我一直收到一条错误消息,提示引导操作 2 失败(没有第二个操作,只有一个)。 According to this , it is a rare error which I get in all my clusters.
据此,这是我在所有集群中遇到的罕见错误。 However, my cluster eventually gets created and I can use it.
但是,我的集群最终被创建并且我可以使用它。
My script is called aws-emr-bootstrap-actions.sh
我的脚本叫做
aws-emr-bootstrap-actions.sh
#!/bin/bash
sudo python3 -m pip install numpy scikit-learn pandas mleap sagemaker boto3
I suspect it might have something to do with a docker image being deployed that invalidates my previous installs or something, but I think (for my Google searches) it is common to use bootstrap actions to install python packages and should work...我怀疑这可能与正在部署的 docker 图像有关,该图像使我以前的安装或其他东西无效,但我认为(对于我的谷歌搜索)通常使用引导操作来安装 python 包并且应该工作......
The PYSPARK
, Python interpreter that Spark is using, is different than the one to which the OP was installing the modules (as confirmed in comments). Spark 使用的
PYSPARK
、 Python 解释器与 OP 安装模块的解释器不同(如评论中所确认)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.