[英]SageMaker notebook connected to EMR import custom Python module
I looked through similar questions but none of them solved my problem.我查看了类似的问题,但没有一个解决了我的问题。 I have a SageMaker notebook instance, opened a SparkMagic Pyspark notebook connected to a AWS EMR cluster.我有一个 SageMaker 笔记本实例,打开了一个连接到 AWS EMR 集群的 SparkMagic Pyspark 笔记本。 I have a SageMaker repo connected to this notebook as well called dsci-Python我有一个连接到此笔记本的 SageMaker 存储库,也称为 dsci-Python
Directory looks like:目录看起来像:
/home/ec2-user/SageMaker/dsci-Python
/home/ec2-user/SageMaker/dsci-Python/pyspark_mle/datalake_data_object/SomeClass
/home/ec2-user/SageMaker/dsci-Python/Pyspark_playground.ipynb
There are __init__.py
under both pyspark_mle and datalake_data_object directory and I have no problem importing them in other environments pyspark_mle 和 datalake_data_object 目录下都有__init__.py
,我在其他环境中导入它们没有问题
when I'm running this code in Pyspark_playground.ipynb:当我在 Pyspark_playground.ipynb 中运行这段代码时:
from pyspark_mle.datalake_data_object.SomeClass.SomeClass import Something
I got No module named 'pyspark_mle'我没有名为“pyspark_mle”的模块
I think this is an environment path thing.我认为这是一个环境路径的事情。
The repo is on your Notebook Instance, whereas the PySpark kernel is executing code on the EMR cluster.存储库位于您的笔记本实例上,而 PySpark 内核正在 EMR 集群上执行代码。
To access these local modules on the EMR cluster, you can clone the repository on the EMR cluster.要访问 EMR 集群上的这些本地模块,您可以克隆 EMR 集群上的存储库。
Also, SparkMagic has a useful magic send_to_spark
which can be used to send data from the Notebook locally to the Spark kernel.此外,SparkMagic 有一个有用的魔法send_to_spark
,可用于将数据从 Notebook 本地发送到 Spark 内核。 https://github.com/jupyter-incubator/sparkmagic/blob/master/examples/Send%20local%20data%20to%20Spark.ipynb https://github.com/jupyter-incubator/sparkmagic/blob/master/examples/Send%20local%20data%20to%20Spark.ipynb
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.