简体   繁体   English

无法将数据复制到 AWS RedShift

[英]Unable to copy data into AWS RedShift

I tried a lot however I am unable to copy data available as json file in S3 bucket(I have read only access to the bucket) to Redshift table using python boto3.我尝试了很多,但是我无法使用 python boto3 将 S3 存储桶中的 json 文件(我对存储桶具有只读访问权限)复制到 Redshift 表。 Below is the python code which I am using to copy the data.下面是我用来复制数据的 python 代码。 Using the same code I was able to create the tables in which I am trying to copy.使用相同的代码,我能够创建我试图复制的表。

import configparser
import psycopg2
from sql_queries import create_table_queries, drop_table_queries


def drop_tables(cur, conn):
    for query in drop_table_queries:
        cur.execute(query)
        conn.commit()


def create_tables(cur, conn):
    for query in create_table_queries:
        cur.execute(query)
        conn.commit()


def main():
    try:
        config = configparser.ConfigParser()
        config.read('dwh.cfg')

        # conn = psycopg2.connect("host={} dbname={} user={} password={} port={}".format(*config['CLUSTER'].values()))
        conn = psycopg2.connect(
            host=config.get('CLUSTER', 'HOST'),
            database=config.get('CLUSTER', 'DB_NAME'),
            user=config.get('CLUSTER', 'DB_USER'),
            password=config.get('CLUSTER', 'DB_PASSWORD'),
            port=config.get('CLUSTER', 'DB_PORT')

        )

        cur = conn.cursor()

        #drop_tables(cur, conn)
        #create_tables(cur, conn)
        qry = """copy DWH_STAGE_SONGS_TBL
             from 's3://udacity-dend/song-data/A/A/A/TRAAACN128F9355673.json'
             iam_role 'arn:aws:iam::xxxxxxx:role/MyRedShiftRole'
             format as json 'auto';"""
        print(qry)
        cur.execute(qry)
        # execute a statement
        # print('PostgreSQL database version:')
        # cur.execute('SELECT version()')
        #
        # # display the PostgreSQL database server version
        # db_version = cur.fetchone()
        # print(db_version)
        print("Executed successfully")

        cur.close()
        conn.close()

        # close the communication with the PostgreSQL

    except Exception as error:
        print("Error while processing")
        print(error)


if __name__ == "__main__":
    main()

I don't see any error in the Pycharm console but I see Aborted status in the redshift query console.我在 Pycharm 控制台中看不到任何错误,但我在 redshift 查询控制台中看到 Aborted 状态。 I don't see any reason why it has been aborted(or I don't know where to look for that)我看不出它被中止的任何原因(或者我不知道在哪里寻找它)

在此处输入图像描述

Other thing that I have noticed is when I run the copy statement in Redshift query editor, it runs fine and data gets moved into the table.我注意到的另一件事是,当我在 Redshift 查询编辑器中运行复制语句时,它运行良好并且数据被移动到表中。 I tried to delete and recreate the cluster but no luck.我试图删除并重新创建集群,但没有运气。 I am not able to figure what I am doing wrong.我无法弄清楚我做错了什么。 Thank you谢谢

Quick read - it looks like you haven't committed the transaction and the COPY is rolled back when the connection closes.快速阅读 - 看起来您还没有提交事务,并且当连接关闭时 COPY 会回滚。 You need to either change the connection configuration to be in "autocommit" or add an explicit "commit()".您需要将连接配置更改为“自动提交”或添加显式“提交()”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM