简体   繁体   English

AWS Glue Python Shell 包导入

[英]AWS Glue Python Shell package import

We create a python shell job which is connecting Redshift and fetching data, below program is working fine in my local system.我们创建了一个连接 Redshift 并获取数据的 python shell 作业,下面的程序在我的本地系统中运行良好。 Below are the steps and programs.下面是步骤和程序。

Program:-程序:-

import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker
#>>>>>>>> MAKE CHANGES HERE <<<<<<<<<<<<< 
DATABASE = "#####"
USER = "#####"
PASSWORD = "#####"
HOST = "#####.redshift.amazonaws.com"
PORT = "5439"
SCHEMA = "test"      #default is "public" 

####### connection and session creation ############## 
connection_string = "redshift+psycopg2://%s:%s@%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = "SET search_path TO %s" % SCHEMA
s.execute(SetPath)
###### All Set Session created using provided schema  #######
################ write queries from here ###################### 
query = "SELECT * FROM test1 limit 2;"
rr = s.execute(query)
all_results =  rr.fetchall()
def pretty(all_results):
    for row in all_results :
        print("row start >>>>>>>>>>>>>>>>>>>>")
        for r in row :
            print(" ----" , r)
        print("row end >>>>>>>>>>>>>>>>>>>>>>")
pretty(all_results)
########## close session in the end ###############
s.close()

Steps :-步骤:-

  • sudo pip install psycopg2须藤 pip 安装 psycopg2
  • sudo pip install sqlalchemy须藤 pip 安装 sqlalchemy
  • sudo pip install sqlalchemy-redshift须藤 pip 安装 sqlalchemy-redshift

I have uploaded the files psycopg2-2.8.4-cp27-cp27m-win32.whl, Flask_SQLAlchemy-2.4.1-py2.py3-none-any.whl and sqlalchemy_redshift-0.7.5-py2.py3-none-any.whl in S3 (s3://####/lib/), and map the folder in Python library path in AWS Glue Job.我已经上传了文件 psycopg2-2.8.4-cp27-cp27m-win32.whl、Flask_SQLAlchemy-2.4.1-py2.py3-none-any.whl 和 sqlalchemy_redshift-0.7.5-py2.py3-none-any.whl在 S3 (s3://####/lib/) 中,并映射 AWS Glue 作业中Python 库路径中的文件夹。

When I run the program below error is occurring.当我运行下面的程序时发生错误。

Traceback (most recent call last):
  File "/tmp/runscript.py", line 113, in <module>
    download_and_install(args.extra_py_files)
  File "/tmp/runscript.py", line 56, in download_and_install
    download_from_s3(s3_file_path, local_file_path)
  File "/tmp/runscript.py", line 81, in download_from_s3
    s3.download_file(bucket_name, s3_key, new_file_path)
  File "/usr/local/lib/python2.7/site-packages/boto3/s3/inject.py", line 172, in download_file
    extra_args=ExtraArgs, callback=Callback)
  File "/usr/local/lib/python2.7/site-packages/boto3/s3/transfer.py", line 307, in download_file
    future.result()
  File "/usr/local/lib/python2.7/site-packages/s3transfer/futures.py", line 106, in result
    return self._coordinator.result()
  File "/usr/local/lib/python2.7/site-packages/s3transfer/futures.py", line 265, in result
    raise self._exception
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found

PS:- The Glue Job Role has full access to S3. PS:- 胶水工作角色具有对 S3 的完全访问权限。

Please suggest how to map those libraries with the program.请建议如何使用程序映射这些库。

You can specify your own Python libraries packaged as an .egg or a .whl file under the "—extra-py-files" flag as shown in below example.您可以在“—extra-py-files”标志下指定打包为 .egg 或 .whl 文件的自己的 Python 库,如下例所示。

Command line example :命令行示例:

aws glue create-job --name python-redshift-test-cli --role role --command '{"Name" :  "pythonshell", "ScriptLocation" : "s3://MyBucket/python/library/redshift_test.py"}' 
     --connections Connections=connection-name --default-arguments '{"--extra-py-files" : ["s3://MyBucket/python/library/redshift_module-0.1-py2.7.egg", "s3://MyBucket/python/library/redshift_module-0.1-py2.7-none-any.whl"]}'

Refernece : Create a glue job with extra python library参考: 使用额外的 python 库创建粘合作业

There is a simple way to import python dependencies using whl files, that can be find on Python site for particular module.有一种使用 whl 文件导入 python 依赖项的简单方法,可以在 Python 站点上找到特定模块。

You can also add multiple wheel files from S3 using comma.您还可以使用逗号从 S3 添加多个轮文件。

For eg "s3://xxxxxxxxx/common/glue/glue_whl/fastparquet-0.4.1-cp37-cp37m-macosx_10_9_x86_64.whl,s3://xxxxxx/common/glue/glue_whl/packaging-20.4-py2.py3-none-any.whl,s3://xxxxxx/common/glue/glue_whl/s3fs-0.5.0-py3-none-any.whl"例如“s3://xxxxxxxxx/common/glue/glue_whl/fastparquet-0.4.1-cp37-cp37m-macosx_10_9_x86_64.whl,s3://xxxxxx/common/glue/glue_whl/packaging-20.4-py2.py3-none -any.whl,s3://xxxxxx/common/glue/glue_whl/s3fs-0.5.0-py3-none-any.whl"

enter image description here在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM