简体   繁体   中英

Unable to copy data into AWS RedShift

I tried a lot however I am unable to copy data available as json file in S3 bucket(I have read only access to the bucket) to Redshift table using python boto3. Below is the python code which I am using to copy the data. Using the same code I was able to create the tables in which I am trying to copy.

import configparser
import psycopg2
from sql_queries import create_table_queries, drop_table_queries


def drop_tables(cur, conn):
    for query in drop_table_queries:
        cur.execute(query)
        conn.commit()


def create_tables(cur, conn):
    for query in create_table_queries:
        cur.execute(query)
        conn.commit()


def main():
    try:
        config = configparser.ConfigParser()
        config.read('dwh.cfg')

        # conn = psycopg2.connect("host={} dbname={} user={} password={} port={}".format(*config['CLUSTER'].values()))
        conn = psycopg2.connect(
            host=config.get('CLUSTER', 'HOST'),
            database=config.get('CLUSTER', 'DB_NAME'),
            user=config.get('CLUSTER', 'DB_USER'),
            password=config.get('CLUSTER', 'DB_PASSWORD'),
            port=config.get('CLUSTER', 'DB_PORT')

        )

        cur = conn.cursor()

        #drop_tables(cur, conn)
        #create_tables(cur, conn)
        qry = """copy DWH_STAGE_SONGS_TBL
             from 's3://udacity-dend/song-data/A/A/A/TRAAACN128F9355673.json'
             iam_role 'arn:aws:iam::xxxxxxx:role/MyRedShiftRole'
             format as json 'auto';"""
        print(qry)
        cur.execute(qry)
        # execute a statement
        # print('PostgreSQL database version:')
        # cur.execute('SELECT version()')
        #
        # # display the PostgreSQL database server version
        # db_version = cur.fetchone()
        # print(db_version)
        print("Executed successfully")

        cur.close()
        conn.close()

        # close the communication with the PostgreSQL

    except Exception as error:
        print("Error while processing")
        print(error)


if __name__ == "__main__":
    main()

I don't see any error in the Pycharm console but I see Aborted status in the redshift query console. I don't see any reason why it has been aborted(or I don't know where to look for that)

在此处输入图像描述

Other thing that I have noticed is when I run the copy statement in Redshift query editor, it runs fine and data gets moved into the table. I tried to delete and recreate the cluster but no luck. I am not able to figure what I am doing wrong. Thank you

Quick read - it looks like you haven't committed the transaction and the COPY is rolled back when the connection closes. You need to either change the connection configuration to be in "autocommit" or add an explicit "commit()".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM