简体   繁体   中英

How to connect Amazon Redshift to python

This is my python code and I want to connect my Amazon Redshift database to Python, but it is showing error in host.

Can anyone tell me the correct syntax? Am I passing all the parameters correctly?

con=psycopg2.connect("dbname = pg_table_def, host=redshifttest-icp.cooqucvshoum.us-west-2.redshift.amazonaws.com, port= 5439, user=me, password= secret")

This is the error:

OperationalError: could not translate host name "redshift://redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com," to address: Unknown host

It appears that you wish to run Amazon Redshift queries from Python code.

The parameters you would want to use are:

  • dbname : This is the name of the database you entered in the Database name field when the cluster was created.
  • user: This is you entered in the Master user name field when the cluster was created.
  • password: This is you entered in the Master user password field when the cluster was created.
  • host: This is the Endpoint provided in the Redshift management console (without the port at the end): redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com
  • port: 5439

For example:

con=psycopg2.connect("dbname=sales host=redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com port=5439 user=master password=secret")

Old question but I just arrived here from Google.

The accepted answer doesn't work with SQLAlchemy, although it's powered by psycopg2:

sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string 'dbname=... host=... port=... user=... password=...'

What worked:

create_engine(f"postgresql://{REDSHIFT_USER}:{REDSHIFT_PASSWORD}@{REDSHIFT_HOST}:{REDSHIFT_PORT}/{REDSHIFT_DATABASE}")

Which works with psycopg2 directly too:

psycopg2.connect(f"postgresql://{REDSHIFT_USER}:{REDSHIFT_PASSWORD}@{REDSHIFT_HOST}:{REDSHIFT_PORT}/{REDSHIFT_DATABASE}")

Using the postgresql dialect works because Amazon Redshift is based on PostgreSQL .

Hope it can help other people

To connect to redshift, you need the postgres+psycopg2 Install it as For Python 3.x:

pip3 install psycopg2-binary

And then use

return create_engine(
        "postgresql+psycopg2://%s:%s@%s:%s/%s"
        % (REDSHIFT_USERNAME, urlquote(REDSHIFT_PASSWORD), REDSHIFT_HOST, RED_SHIFT_PORT,
           REDSHIFT_DB,)
    )

The easiest way to query AWS Redshift from python is through this Jupyter extension - Jupyter Redshift

Not only can you query and save your results but also write them back to the database from within the notebook environment.

Well, for Redshift the idea is made COPY from S3, is faster than every different way, but here is some example to do it:

first you must install some dependencies

for linux users sudo apt-get install libpq-dev

for mac users brew install libpq

install with pip this dependencies pip3 install psycopg2-binary pip3 install sqlalchemy pip3 install sqlalchemy-redshift

import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker


#>>>>>>>> MAKE CHANGES HERE <<<<<<<<<<<<<
DATABASE = "dwtest"
USER = "youruser"
PASSWORD = "yourpassword"
HOST = "dwtest.awsexample.com"
PORT = "5439"
SCHEMA = "public"

S3_FULL_PATH = 's3://yourbucket/category_pipe.txt'
ARN_CREDENTIALS = 'arn:aws:iam::YOURARN:YOURROLE'
REGION = 'us-east-1'

############ CONNECTING AND CREATING SESSIONS ############
connection_string = "redshift+psycopg2://%s:%s@%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = "SET search_path TO %s" % SCHEMA
s.execute(SetPath)
###########################################################



############ RUNNING COPY ############
copy_command = '''
copy category from '%s'
credentials 'aws_iam_role=%s'
delimiter '|' region '%s';
''' % (S3_FULL_PATH, ARN_CREDENTIALS, REGION)
s.execute(copy_command)
s.commit()
######################################



############ GETTING DATA ############
query = "SELECT * FROM category;"
rr = s.execute(query)
all_results =  rr.fetchall()

def pretty(all_results):
    for row in all_results :
        print("row start >>>>>>>>>>>>>>>>>>>>")
        for r in row :
            print(" ---- %s" % r)
        print("row end >>>>>>>>>>>>>>>>>>>>>>")

pretty(all_results)
s.close()
######################################

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM