In Amazon EMR, I am using the following script as a custom bootstrap action to install python packages. The script runs OK (checked the logs, packages installed successfully) but when I open a notebook in Jupyter Lab, I cannot import any of them. If I open a terminal in JupyterLab and run pip list
or pip3 list
, none of my packages is there. Even if I go to /
and run find. -name mleap
find. -name mleap
for instance, it does not exist.
Something I have noticed is that on the master node, I am getting all the time an error saying bootstrap action 2 has failed (there is no second action, only one). According to this , it is a rare error which I get in all my clusters. However, my cluster eventually gets created and I can use it.
My script is called aws-emr-bootstrap-actions.sh
#!/bin/bash
sudo python3 -m pip install numpy scikit-learn pandas mleap sagemaker boto3
I suspect it might have something to do with a docker image being deployed that invalidates my previous installs or something, but I think (for my Google searches) it is common to use bootstrap actions to install python packages and should work...
The PYSPARK
, Python interpreter that Spark is using, is different than the one to which the OP was installing the modules (as confirmed in comments).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.