[英]How can I import external python libraries in python shell AWS Glue job
I have been trying to import an external python libraries in aws glue python shell job.我一直在尝试在 aws glue python shell 作业中导入外部 python 库。
Processing ./glue-python-libs-cq4p0rs8/pyodbc-4.0.32-cp310-cp310-win_amd64.whl Installing collected packages: pyodbc Successfully installed pyodbc-4.0.32处理中 ./glue-python-libs-cq4p0rs8/pyodbc-4.0.32-cp310-cp310-win_amd64.whl 安装采集包:pyodbc 安装成功 pyodbc-4.0.32
WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user.警告:目录“/.cache/pip”或其父目录不属于当前用户或不可写。 The cache has been disabled.
缓存已被禁用。 Check the permissions and owner of that directory.
检查该目录的权限和所有者。 If executing pip with sudo, you may want sudo's -H flag.
如果使用 sudo 执行 pip,您可能需要 sudo 的 -H 标志。
File "/tmp/glue-python-scripts-g_mt5xzp/Glue-ETL-Dev.py", line 2, in ModuleNotFoundError: No module named 'pyodbc' ModuleNotFoundError 中的文件“/tmp/glue-python-scripts-g_mt5xzp/Glue-ETL-Dev.py”,第 2 行:没有名为“pyodbc”的模块
I am downloading the wheel files from here : https://pypi.org/project/pyodbc/#files我正在从这里下载轮文件: https ://pypi.org/project/pyodbc/#files
No matter how many versions of whl files I refer in the glue job, it always throws the same error.无论我在胶水作业中引用了多少个版本的 whl 文件,它总是会引发相同的错误。
can anyone enlighten me where it's going wrong?谁能告诉我哪里出了问题?
I have tried to follow these guides [1], [2] in the official documentation of AWS, but I was facing some issues when importing some libraries, such as psycopg2.我曾尝试遵循 AWS 官方文档中的这些指南 [1]、[2],但在导入某些库(例如 psycopg2)时遇到了一些问题。 Finally, I managed to import the desired libraries by following the steps of this tutorial from AWS blog [3].
最后,我按照 AWS 博客 [3] 中本教程的步骤成功导入了所需的库。 The blog is in Spanish, but maybe you can manage to translate it.
该博客是西班牙语的,但也许您可以设法翻译它。
Basically what they do is create a setup.py script on which they define the required libraries.基本上他们所做的是创建一个 setup.py 脚本,在该脚本上定义所需的库。 Afterwards, they generate a .whl file with those libraries and they upload that file to a s3 bucket from which the Glue Python Shell script gets the required libraries.
之后,他们使用这些库生成一个 .whl 文件,并将该文件上传到 s3 存储桶,Glue Python Shell 脚本从中获取所需的库。
[1] https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#aws-glue-programming-python-libraries-job [1] https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#aws-glue-programming-python-libraries-job
[2] https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#create-python-extra-library [2] https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#create-python-extra-library
[3] https://aws.amazon.com/es/blogs/aws-spanish/usando-python-shell-y-pandas-en-aws-glue-para-procesar-conjuntos-de-datos-pequenos-y-medianos/ [3] https://aws.amazon.com/es/blogs/aws-spanish/usando-python-shell-y-pandas-en-aws-glue-para-procesar-conjuntos-de-datos-pequenos-y -中位数/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.