We have a setup to sync rds postgres changes into s3 using DMS. Now, I want to run ETL on this s3 data(in parquet) using Glue as scheduler.
My plan is to build SQL queries to do the transformation, execute them on redshift spectrum and unload data back into s3 in parquet format. I don't want to Glue Spark as my data loads do not require that kind of capacity.
However, I am facing some problems connecting to redshift from glue, primarily library version issues and the right whl files to be used for pg8000/psycopg2. Wondering if anyone has experience with such implementation and how were you able to manage the db connections from Glue Python shell.
I'm doing something similar in a Python Shell Job but with Postgres instead of Redshift.
This is the whl file I use
psycopg_binary-2.9.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
An updated version can be found here .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.