I have an RDS database that is sitting in a VPC. My ultimate goal is to run a nightly job that takes the data from RDS and stores it in Redshift. I am currently doing this using Glue and Glue connections. I am able to write to RDS/Redshift using connections with the following line:
datasource2 = DynamicFrame.fromDF(dfFinal, glueContext, "scans")
output = glueContext.write_dynamic_frame.from_jdbc_conf(frame = datasource2, catalog_connection = "MPtest", connection_options = {"database" : "app", "dbtable" : "scans"})
Where dfFinal is my final data frame after a bunch of transformations that are not essential to this post. That code works fine, however I would like to modify it so I could read a table from RDS into a data frame.
Since the RDS database is in a VPC, I would like to use the catalog_connection
parameter, but the DynamicFrameReader
class has no from_jdbc_conf
method and thus no obvious way to use my glue connection.
I have seen posts that say you could use a method like this:
url = "jdbc:postgresql://host/dbName"
properties = {
"user" : "user",
"password" : "password"
}
df = spark.read.jdbc(url=url, table="table", properties=properties)
But when I try that it times out because it's not a publicly accessible database. Any suggestions?
You are on the right track with using Glue connections.
Define Glue connection of Type JDBC for your Postgres instance
Type JDBC JDBC URL jdbc:postgresql://<RDS ip>:<RDS port>/<database_name> VPC Id <VPC of RDS instance> Su.net <su.net of RDS instance> Security groups <Security Group allowed to connect to RDS>
Edit Glue Job, and select the Glue Connection so it appears under "Required Connections"
Create connections options dictionary as
options = {'url': connection.jdbc_url,
'user': connection.username,
'password': connection.password,
'dbtable': table
}
table_ddf = glueContext.create_dynamic_frame.from_options(
connection_type='postgresql',
connection_options=options,
transformation_ctx=transformation_ctx
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.