简体   繁体   English

如何在 python shell AWS Glue 作业中导入外部 python 库

[英]How can I import external python libraries in python shell AWS Glue job

I have been trying to import an external python libraries in aws glue python shell job.我一直在尝试在 aws glue python shell 作业中导入外部 python 库。

  1. I have uploaded the whl file for Pyodbc in s3.我已经在 s3 中上传了 Pyodbc 的 whl 文件。
  2. I referenced the s3 path in "python library path" in additional properties of Glue job.我在 Glue 作业的附加属性中引用了“python 库路径”中的 s3 路径。
  3. I also tried to give job parameter --extra-py-files with value as s3 path of whl file.我还尝试将作业参数 --extra-py-files 的值作为 whl 文件的 s3 路径。
  4. whenever I write the line "from pyodbc import pyodbc as db"or just "import pyodbc" it always returns "ModuleNotFoundError: No module named 'pyodbc'"每当我写“from pyodbc import pyodbc as db”或只是“import pyodbc”时,它总是返回“ModuleNotFoundError:没有名为'pyodbc'的模块”
  5. Logs are shown as below:日志如下所示:

Processing ./glue-python-libs-cq4p0rs8/pyodbc-4.0.32-cp310-cp310-win_amd64.whl Installing collected packages: pyodbc Successfully installed pyodbc-4.0.32处理中 ./glue-python-libs-cq4p0rs8/pyodbc-4.0.32-cp310-cp310-win_amd64.whl 安装采集包:pyodbc 安装成功 pyodbc-4.0.32

WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user.警告:目录“/.cache/pip”或其父目录不属于当前用户或不可写。 The cache has been disabled.缓存已被禁用。 Check the permissions and owner of that directory.检查该目录的权限和所有者。 If executing pip with sudo, you may want sudo's -H flag.如果使用 sudo 执行 pip,您可能需要 sudo 的 -H 标志。

File "/tmp/glue-python-scripts-g_mt5xzp/Glue-ETL-Dev.py", line 2, in ModuleNotFoundError: No module named 'pyodbc' ModuleNotFoundError 中的文件“/tmp/glue-python-scripts-g_mt5xzp/Glue-ETL-Dev.py”,第 2 行:没有名为“pyodbc”的模块

I am downloading the wheel files from here : https://pypi.org/project/pyodbc/#files我正在从这里下载轮文件: https ://pypi.org/project/pyodbc/#files

No matter how many versions of whl files I refer in the glue job, it always throws the same error.无论我在胶水作业中引用了多少个版本的 whl 文件,它总是会引发相同的错误。

can anyone enlighten me where it's going wrong?谁能告诉我哪里出了问题?

I have tried to follow these guides [1], [2] in the official documentation of AWS, but I was facing some issues when importing some libraries, such as psycopg2.我曾尝试遵循 AWS 官方文档中的这些指南 [1]、[2],但在导入某些库(例如 psycopg2)时遇到了一些问题。 Finally, I managed to import the desired libraries by following the steps of this tutorial from AWS blog [3].最后,我按照 AWS 博客 [3] 中本教程的步骤成功导入了所需的库。 The blog is in Spanish, but maybe you can manage to translate it.该博客是西班牙语的,但也许您可以设法翻译它。

Basically what they do is create a setup.py script on which they define the required libraries.基本上他们所做的是创建一个 setup.py 脚本,在该脚本上定义所需的库。 Afterwards, they generate a .whl file with those libraries and they upload that file to a s3 bucket from which the Glue Python Shell script gets the required libraries.之后,他们使用这些库生成一个 .whl 文件,并将该文件上传到 s3 存储桶,Glue Python Shell 脚本从中获取所需的库。

[1] https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#aws-glue-programming-python-libraries-job [1] https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#aws-glue-programming-python-libraries-job

[2] https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#create-python-extra-library [2] https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#create-python-extra-library

[3] https://aws.amazon.com/es/blogs/aws-spanish/usando-python-shell-y-pandas-en-aws-glue-para-procesar-conjuntos-de-datos-pequenos-y-medianos/ [3] https://aws.amazon.com/es/blogs/aws-spanish/usando-python-shell-y-pandas-en-aws-glue-para-procesar-conjuntos-de-datos-pequenos-y -中位数/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM