简体   繁体   中英

AWS Glue read from RDS database that's in VPC

I have an RDS database that is sitting in a VPC. My ultimate goal is to run a nightly job that takes the data from RDS and stores it in Redshift. I am currently doing this using Glue and Glue connections. I am able to write to RDS/Redshift using connections with the following line:

datasource2 = DynamicFrame.fromDF(dfFinal, glueContext, "scans")

output = glueContext.write_dynamic_frame.from_jdbc_conf(frame = datasource2, catalog_connection = "MPtest", connection_options = {"database" : "app", "dbtable" : "scans"})

Where dfFinal is my final data frame after a bunch of transformations that are not essential to this post. That code works fine, however I would like to modify it so I could read a table from RDS into a data frame.

Since the RDS database is in a VPC, I would like to use the catalog_connection parameter, but the DynamicFrameReader class has no from_jdbc_conf method and thus no obvious way to use my glue connection.

I have seen posts that say you could use a method like this:

url = "jdbc:postgresql://host/dbName"
properties = {
"user" : "user",
"password" : "password"
}
df = spark.read.jdbc(url=url, table="table", properties=properties)

But when I try that it times out because it's not a publicly accessible database. Any suggestions?

You are on the right track with using Glue connections.

  1. Define Glue connection of Type JDBC for your Postgres instance

    Type JDBC JDBC URL jdbc:postgresql://<RDS ip>:<RDS port>/<database_name> VPC Id <VPC of RDS instance> Su.net <su.net of RDS instance> Security groups <Security Group allowed to connect to RDS>
  2. Edit Glue Job, and select the Glue Connection so it appears under "Required Connections"

  3. Create connections options dictionary as


options = {'url': connection.jdbc_url,

           'user': connection.username,

           'password': connection.password,

           'dbtable': table
}

  1. Use the options dictionary in the below to create a Dynamic frame to read from the table

table_ddf = glueContext.create_dynamic_frame.from_options(

        connection_type='postgresql',

        connection_options=options,

        transformation_ctx=transformation_ctx
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM