简体   繁体   English

使用IAM角色凭据使用Python卸载到S3

[英]Unload to S3 with Python using IAM Role credentials

In Redshift, I run the following to unload data from a table into a file in S3: 在Redshift中,我运行以下命令将数据从表中卸载到S3中的文件中:

unload('select * from table')
to 's3://bucket/unload/file_'
iam_role 'arn:aws:iam:<aws-account-id>:role/<role_name>'

I would like to do the same in Python- any suggestion how to replicate this? 我想在Python中做同样的事情-任何建议如何复制它? I saw examples using access key and secret, but that is not an option for me- need to use role based credentials on a non-public bucket. 我看到了使用访问密钥和机密的示例,但这不是我的选择-需要在非公共存储桶上使用基于角色的凭据。

You will need two sets of credentials. 您将需要两组凭证。 IAM credentials via an IAM Role to access the S3 bucket and Redshift ODBC credentials to execute SQL commands. 通过IAM角色的IAM凭据访问S3存储桶,并通过Redshift ODBC凭据执行SQL命令。

Create a Python program that connects to Redshift, in a manner similar to other databases such as SQL Server, and execute your query. 创建一个以类似于其他数据库(例如SQL Server)的方式连接到Redshift的Python程序,然后执行查询。 This program will need Redshift login credentials and not IAM credentials (Redshift username, password). 该程序将需要Redshift登录凭据,而不需要IAM凭据(Redshift用户名,密码)。

The IAM credentials for S3 are assigned as a role to Redshift so that Redshift can store the results on S3. 将S3的IAM凭据作为角色分配给Redshift,以便Redshift可以将结果存储在S3上。 This is the iam_role 'arn:aws:iam:<aws-account-id>:role/<role_name>' part of the Redshift query in your question. 这是问题中Redshift查询的iam_role 'arn:aws:iam:<aws-account-id>:role/<role_name>'部分。

You do not need boto3 (or boto) to access Redshift, unless you plan to actually interface with the Redshift API (which does not access the database stored inside Redshift). 您不需要boto3(或boto)来访问Redshift,除非您打算实际与Redshift API交互(该接口不访问Redshift内部存储的数据库)。

Here is an example Python program to access Redshift. 这是访问Redshift的示例Python程序。 The link to this code is here . 此代码的链接在这里 Credit due to Varun Verma 归功于Varun Verma

There are other examples on the Internet to help you get started. Internet上还有其他示例可以帮助您入门。

############ REQUIREMENTS ####################
# sudo apt-get install python-pip 
# sudo apt-get install libpq-dev
# sudo pip install psycopg2
# sudo pip install sqlalchemy
# sudo pip install sqlalchemy-redshift
##############################################

import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker

#>>>>>>>> MAKE CHANGES HERE <<<<<<<<<<<<< 
DATABASE = "dbname"
USER = "username"
PASSWORD = "password"
HOST = "host"
PORT = ""
SCHEMA = "public"      #default is "public" 

####### connection and session creation ############## 
connection_string = "redshift+psycopg2://%s:%s@%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = "SET search_path TO %s" % SCHEMA
s.execute(SetPath)
###### All Set Session created using provided schema  #######

################ write queries from here ###################### 
query = "unload('select * from table') to 's3://bucket/unload/file_' iam_role 'arn:aws:iam:<aws-account-id>:role/<role_name>';"
rr = s.execute(query)
all_results =  rr.fetchall()

def pretty(all_results):
    for row in all_results :
        print "row start >>>>>>>>>>>>>>>>>>>>"
        for r in row :
            print " ----" , r
        print "row end >>>>>>>>>>>>>>>>>>>>>>"


pretty(all_results)


########## close session in the end ###############
s.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM