简体   繁体   English

连接到 EMR 的 SageMaker notebook 导入自定义 Python 模块

[英]SageMaker notebook connected to EMR import custom Python module

I looked through similar questions but none of them solved my problem.我查看了类似的问题,但没有一个解决了我的问题。 I have a SageMaker notebook instance, opened a SparkMagic Pyspark notebook connected to a AWS EMR cluster.我有一个 SageMaker 笔记本实例,打开了一个连接到 AWS EMR 集群的 SparkMagic Pyspark 笔记本。 I have a SageMaker repo connected to this notebook as well called dsci-Python我有一个连接到此笔记本的 SageMaker 存储库,也称为 dsci-Python

Directory looks like:目录看起来像:

/home/ec2-user/SageMaker/dsci-Python
/home/ec2-user/SageMaker/dsci-Python/pyspark_mle/datalake_data_object/SomeClass
/home/ec2-user/SageMaker/dsci-Python/Pyspark_playground.ipynb

There are __init__.py under both pyspark_mle and datalake_data_object directory and I have no problem importing them in other environments pyspark_mle 和 datalake_data_object 目录下都有__init__.py ,我在其他环境中导入它们没有问题

when I'm running this code in Pyspark_playground.ipynb:当我在 Pyspark_playground.ipynb 中运行这段代码时:

from pyspark_mle.datalake_data_object.SomeClass.SomeClass import Something

I got No module named 'pyspark_mle'我没有名为“pyspark_mle”的模块

I think this is an environment path thing.我认为这是一个环境路径的事情。

The repo is on your Notebook Instance, whereas the PySpark kernel is executing code on the EMR cluster.存储库位于您的笔记本实例上,而 PySpark 内核正在 EMR 集群上执行代码。

To access these local modules on the EMR cluster, you can clone the repository on the EMR cluster.要访问 EMR 集群上的这些本地模块,您可以克隆 EMR 集群上的存储库。

Also, SparkMagic has a useful magic send_to_spark which can be used to send data from the Notebook locally to the Spark kernel.此外,SparkMagic 有一个有用的魔法send_to_spark ,可用于将数据从 Notebook 本地发送到 Spark 内核。 https://github.com/jupyter-incubator/sparkmagic/blob/master/examples/Send%20local%20data%20to%20Spark.ipynb https://github.com/jupyter-incubator/sparkmagic/blob/master/examples/Send%20local%20data%20to%20Spark.ipynb

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Sagemaker 笔记本上的 xgboost 导入失败 - xgboost on Sagemaker notebook import fails 在 sagemaker 中从 s3 导入模块 - import module from s3 in sagemaker 在 EMR 中运行 Jupyter PySpark notebook,虽然已安装,但未找到模块 - Running Jupyter PySpark notebook in EMR, module not found, although it is installed 为什么我无法在 python 中导入 sagemaker? - Why can I not import sagemaker in python? 为什么我在本地运行笔记本时可以导入 LambdaStep,但在 Sagemaker studio 中运行时却不能? - Why is it that I can import LambdaStep when I run my notebook locally but not when I run it in Sagemaker studio? Graphviz 在 SageMaker notebook 实例上运行,但不在 SageMaker Studio 上运行 - Graphviz running on SageMaker notebook instance but not SageMaker Studio 在 SageMaker Notebook 中安装 spark 3.2 - Install spark 3.2 in SageMaker Notebook 将参数传递给 Sagemaker Notebook 实例 - Passing Parameters to Sagemaker Notebook Instance 在 AWS Sagemaker 上恢复已删除的笔记本 - Recover deleted notebook on AWS Sagemaker 在 SageMaker 笔记本实例中打开 jupyter 是否需要 sagemaker:CreatePresignedDomainUrl? - Is sagemaker:CreatePresignedDomainUrl required to open jupyter in SageMaker notebook instance?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM