简体   繁体   中英

Using python, standard approach to load data from S3 to AWS RDS Postgres?

Per these AWS Amazon RDS docs , it looks like AWS offers an aws_s3 PostgreSQL extension for transferring data from S3 to Postgres in RDS.

We're using airflow to orchestrate our data ingestion pipelines, and it would be great if there was a python solution here. I have little experience with PostgreSQL and I've never used any PostgreSQL extensions, and being able to move data around using python is going to help us a ton. For the time being, we are avoiding AWS tools such as AWS Data Pipeline and AWS Glue in favor of building our own architecture with python and airflow.

For reference, we have the following for our GCP architecture for ingesting data from GCS into BigQuery using python:

from google.cloud import bigquery

# create BiqQuery client object + load job config
client = bigquery.Client()
job_config = bigquery.LoadJobConfig(
    schema=None, # autodetech for now
    source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON, # use ndjson
    write_disposition=bigquery.WriteDisposition.WRITE_APPEND, # append to existing
    autodetect=True
)
    
# and load into Bigquery
table_id = "our_gcp_project.our_model.our_table"
gcs_uri = "gs://our_bucket/path-to-our/file.json"
load_job = client.load_table_from_uri(gcs_uri, table_id, job_config=job_config) # location="US"  # Make an API request.
load_job.result()  # Waits for the job to complete

# check for success
destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))

We're pretty much looking to port this code from GCS/BigQuery into S3/Postgres RDS, and want to get started in the right direction.

You have the option in PostgreSQL to invoke Lambda functions.

PostgreSQL-Lambda

The Lambda Runtime can be set to use Python and you can use the Boto3 library to access the AWS services (Like S3) from the Lambda.

Boto3

Be aware of the limitations of Lambda like the maximum 15 minute run time and payload sizes.

Lambda Limits

Also when creating a Lambda that needs access to the DB you will need to create a layer that contains the drivers that you can assign to your Lambda.

Lambda Layers

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM