I am running EMR cluster(AWS) but I do not understand how notebook imports packages. I am running PySpark kernel.
import boto3
No module named 'boto3'
Traceback (most recent call last):
ModuleNotFoundError: No module named 'boto3'
print (sys.version) shows
3.7.6 (default, Feb 26 2020, 20:54:15)
[GCC 7.3.1 20180712 (Red Hat 7.3.1-6)]
print(sys.executable) shows
/tmp/1594625399736-0/bin/python
I have both Conda and pip3 install of boto3.
How to solve this?
Are you using pyspark? If yes, then you need to install the packages in the spark context. Refer to this AWS document: https://aws.amazon.com/blogs/big-data/install-python-libraries-on-a-running-cluster-with-emr-notebooks/
similarly install any dependency packages if you see module not found error on import. Make sure the versions are compatible.
sc.list_packages()
Package Version
-------------------------- -------
beautifulsoup4 4.9.0
boto 2.49.0
cycler 0.10.0
jmespath 0.9.5
kiwisolver 1.2.0
lxml 4.5.0
matplotlib 3.2.2
mysqlclient 1.4.2
nltk 3.4.5
nose 1.3.4
numpy 1.19.0
pandas 1.0.5
pip 9.0.1
py-dateutil 2.2
py4j 0.10.9
pyparsing 2.4.7
pyspark 3.0.0
python-dateutil 2.8.1
python37-sagemaker-pyspark 1.3.0
pytz 2020.1
PyYAML 5.3.1
setuptools 28.8.0
six 1.15.0
soupsieve 1.9.5
wheel 0.29.0
windmill 1.6
I have boto.
sc.install_pypi_package("boto3")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.