简体   繁体   English

导入错误:AWS Glue作业脚本中没有模块-Python

[英]import error : No module in AWS Glue job script- Python

I am trying to provide my custom python code which requires libraries that are not supported by AWS(pandas). 我正在尝试提供我的自定义python代码,该代码需要AWS(pandas)不支持的库。 So, I created a zip file with the necessary libraries and uploaded it to the S3 bucket. 因此,我使用必要的库创建了一个zip文件,并将其上传到S3存储桶。 While running the job, I pointed the path of S3 bucket in the advanced properties.Still my job is not running successfully. 在运行作业时,我在高级属性中指出了S3存储桶的路径。但我的作业仍未成功运行。 Can anyone suggest why? 有人可以建议原因吗? 1.Do I have to include my code in the zip file? 1.是否需要在zip文件中包含我的代码? If yes then how will Glue understand that it's the code? 如果是,那么Glue将如何理解它就是代码? 2. Also do I need to create a package or just zip file will do? 2.另外,我需要创建一个包还是只是zip文件? Appreciate the help! 感谢帮助!

According to AWS Glue Documentation: 根据AWS Glue文档:

Only pure Python libraries can be used. 只能使用纯Python库。 Libraries that rely on C extensions, such as the pandas Python Data Analysis Library, are not yet supported. 尚不支持依赖C扩展的库,例如pandas Python数据分析库。

I think it wouldn't work even if we upload the python library as a zip file, if the library you are using has a dependency for C extensions. 我认为即使我们将python库作为zip文件上传,如果您使用的库具有C扩展的依赖项,也无法正常工作。 I had tried using Pandas, Holidays, etc the same way you have tried, and on contacting AWS Support, they mentioned it is in their to do list (support for these python libaries), but no ETA as of now. 我曾尝试以与您尝试过的方式一样使用Pandas,Holidays等,并在联系AWS Support时提到他们在待办事项列表中(对这些python库的支持),但到目前为止没有ETA。

So, any libraries that are not native python, would not work in AWS Glue, at this point. 因此,目前非本地python的任何库都无法在AWS Glue中使用。 But should be available in the near future, since this is a popular demand. 但这应该会在不久的将来面世,因为这是很受欢迎的需求。

If still you would like to try it out, please refer to this link , where its explained how to package the external libraries to run in AWS glue, I tried it but didnt work for me. 如果您仍然想尝试一下,请参考此链接 ,其中解释了如何打包外部库以在AWS粘合中运行,我尝试过但对我没用。

An update on AWS Glue Jobs released on 22nd Jan 2019. AWS Glue Jobs的更新于2019年1月22日发布。

Introducing Python Shell Jobs in AWS Glue -- Posted On: Jan 22, 2019 在AWS Glue中引入Python Shell作业-发表于:2019年1月22日

Python shell jobs in AWS Glue support scripts that are compatible with Python 2.7 and come pre-loaded with libraries such as the Boto3, NumPy, SciPy, pandas, and others. AWS Glue中的Python Shell作业支持与Python 2.7兼容的脚本,并预加载了Boto3,NumPy,SciPy,pandas等库。 You can run Python shell jobs using 1 DPU (Data Processing Unit) or 0.0625 DPU (which is 1/16 DPU). 您可以使用1 DPU(数据处理单元)或0.0625 DPU(即1/16 DPU)运行Python Shell作业。 A single DPU provides processing capacity that consists of 4 vCPUs of compute and 16 GB of memory. 单个DPU提供的处理能力由4个计算vCPU和16 GB内存组成。

More info at : https://aws.amazon.com/about-aws/whats-new/2019/01/introducing-python-shell-jobs-in-aws-glue/ 有关更多信息, 访问: https : //aws.amazon.com/about-aws/whats-new/2019/01/introducing-python-shell-jobs-in-aws-glue/

https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html

As Yuva's answer mentioned, I believe it's currently impossible to import a library that is not purely in Python and the documentation reflects that. 正如Yuva的回答所提到的那样,我认为目前无法导入非纯粹使用Python的库,而文档反映了这一点。

However, in case someone came here looking for an answer on how to import a python library in AWS Glue in general, there is a good explanation in this post on how to do it with the pg8000 library: AWS Glue - Truncate destination postgres table prior to insert 但是,万一有人来这里寻找有关如何在AWS Glue中导入python库的答案,这篇文章中有一个很好的解释说明如何使用pg8000库: AWS Glue- 截断目标postgres表插入

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM