简体   繁体   English

AWS Glue Python-Shell:如何提供您自己的库?

[英]AWS Glue Python-Shell : How to provide your own library?

I'd like to have a aws glue python-shell job connect to a MS SQL Server.我想让一个 aws 胶水 python-shell 作业连接到 MS SQL Server。 I understand that I should use the pymssql library.我知道我应该使用 pymssql 库。 On my computer I have the script working but with AWS I understand that I need to upload the pymssql library to S3 and reference it.在我的计算机上,我的脚本可以运行,但是使用 AWS 我知道我需要将 pymssql 库上传到 S3 并引用它。

I'm following their example on how to provide your own egg file if I wanted to connect to redshift but after creating the egg file and running the script I get this error如果我想连接到 redshift,我正在遵循 他们关于如何提供自己的 egg 文件 的示例,但是在创建 egg 文件并运行脚本后,我收到此错误

Couldn't find index page for 'redshift-module' (maybe misspelled?)

Can anyone help provide how I can accomplish providing my own library?谁能帮助提供我如何完成提供我自己的图书馆? In either redshift or ms sql.在 redshift 或 ms sql 中。 Just looking for an example I can adapt and work from.只是寻找一个我可以适应和工作的例子。

Full Job Log完整作业日志

Creating /glue/lib/installation/site.py
Processing redshift_module-0.1-py3.7.egg
Copying redshift_module-0.1-py3.7.egg to /glue/lib/installation
Adding redshift-module 0.1 to easy-install.pth file

Installed /glue/lib/installation/redshift_module-0.1-py3.7.egg
Processing dependencies for redshift-module==0.1
Searching for redshift-module==0.1
Reading https://pypi.org/simple/redshift-module/
Scanning index of all packages (this may take a while)
Reading https://pypi.org/simple/

Full Error Output完整错误输出

Couldn't find index page for 'redshift-module' (maybe misspelled?)
No local packages or working download links found for redshift-module==0.1
error: Could not find suitable distribution for Requirement.parse('redshift-module==0.1')

The answer is mentioned here答案在这里提到

In a nut shell, AWS Glue uses Python 3.6 while the egg 'redshift_module-0.1-py3.7.egg' has been built using python 3.7简而言之, AWS Glue 使用 Python 3.6,而鸡蛋“redshift_module-0.1-py3.7.egg”是使用 Python 3.7 构建的

You might also need to need to have a look on the documentation which has some useful packaging options like install_requires=['package']您可能还需要查看文档,其中包含一些有用的打包选项,例如install_requires=['package']

I faced the same issue while performing basic testing in glue job, on further investigating the scenario I noticed that Glue Python shell 3 uses Python 3.6 only.我在粘合作业中执行基本测试时遇到了同样的问题,在进一步调查我注意到 Glue Python shell 3 仅使用 Python 3.6 的场景时。 NOTE: Created egg files with different versions of python will not support each other what I observe in this issue.注意:使用不同版本的 python 创建的 egg 文件不会相互支持我在这个问题中观察到的。

To omit this, you would need to make a wheel file which is compatible with any version.要省略这一点,您需要制作一个与任何版本兼容的轮文件。

  1. Run below command in your directory where setup.py file exist: $ python3 setup.py bdist_wheel在 setup.py 文件所在的目录中运行以下命令: $ python3 setup.py bdist_wheel

  2. Upload wheel file to S3 bucket将车轮文件上传到 S3 存储桶

  3. Go to AWS glue job console and create new Job, give all required parameters and change the type as "Python Shell" and give your s3 path (where wheel file exist) in "Python library path"转到 AWS 胶水作业控制台并创建新作业,提供所有必需的参数并将类型更改为“Python Shell”,并在“Python 库路径”中提供您的 s3 路径(wheel 文件所在的位置)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM