I would like to read data from redshift table and load it to dataframe and perform transformations. I used psycopg2
to connect to redshift and used pandas read_sql
to query the table as below
con = psycopg2.connect("dbname=sales host=redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com port=5439 user=master password=secret")
cur = con.cursor()
sql = "select * from dtw.rpt_account_transfer_hist where transfer_date>=2020-07-01;"
df = pd.read_sql(sql, con)
I see an Empty Dataframe but data exist when I query the database. When I print the schema everything is non-null object .
I parameterized the transfer_date as below and tried again. This time the whole data set is returned without any filter being applied. Not sure where I'm missing. I tried cast in the sql query itself but it returned an empty dataframe. Any leads please.
curr_dt = datetime.strftime(datetime.now() - timedelta(3), '%Y-%m-%d')
sql = "select * from dtw.rpt_account_transfer_hist where transfer_date>=" +str(curr_dt)+";"
df = pd.read_sql(sql, con)
The data in redshift table is like below with datatype as varchar
for col1
, col2
, col4
, col5
and date
for transfer_date
.
col1 col2 transfer_date col4 col5
6052148 670018 2020-07-13 640033 6052148
5260969 640737 2020-07-11 640033 5260969
4778065 610050 2020-07-11 610017 4778065
7942224 690020 2020-07-11 690032 7942224
5260969 640737 2020-07-10 640033 5260969
4778065 610050 2020-07-10 610017 4778065
7942224 690020 2020-07-10 690032 7942224
5073022 640601 2020-07-09 640679 5073022
0309991 640601 2020-07-09 640729 0309991
I think you're missing single quotes around the date, try with this:
sql = "select * from dtw.rpt_account_transfer_hist where transfer_date>='2020-07-01';"
Sounds a bit weird, I haven't changed anything and it started working.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.