简体   繁体   English

如何将数据从 AWS Aurora Postgres DB 导出到 Redshift?

[英]How to export data from AWS Aurora Postgres DB to Redshift?

I have a Postgres DB hosted on AWS Aurora from which I need to retrieve data and insert it into Redshift .我有一个托管在AWS Aurora上的Postgres DB ,我需要从中检索数据并将其插入Redshift

My current approach is as follows:我目前的做法如下:

  1. Create an Aurora DB connection using Psycopg2 .使用Psycopg2创建Aurora 数据库连接。
  2. With Aurora connection created above, query the Aurora DB table and export the resultset as a CSV file to S3 using OUTFILE使用上面创建的Aurora连接,查询Aurora DB表并使用OUTFILE将结果集作为CSV文件导出到S3
  3. From S3 , Redshift connection using Psycopg2 .S3开始,使用Psycopg2进行Redshift连接。

I'm trying to optimize this by removing the S3 service and connecting Aurora to Redshift directly.我试图通过删除S3服务并将Aurora直接连接到Redshift来优化这一点。

Here's what I want to do for which I couldn't find resources:这是我找不到资源的我想做的事情:

Query the Aurora table - table1 and directly export the result set into the Redshift table - table1.查询Aurora表-table1,直接将结果集导出到Redshift表-table1中。

I'm not even sure if this is possible with the current system.我什至不确定当前系统是否可以做到这一点。 Any thoughts?有什么想法吗?

There are two ways to get data into an Amazon Redshift database:有两种方法可以将数据导入 Amazon Redshift 数据库:

  • COPY command to load from Amazon S3从 Amazon S3 加载的COPY命令
  • INSERT statement to insert data provided as part of the SQL statement INSERT语句插入作为 SQL 语句的一部分提供的数据

The COPY method is recommended for normal data loading. COPY方法推荐用于正常的数据加载。 It runs in parallel across slices and stores the data as efficiently as possible given that it is appending data.它跨切片并行运行,并尽可能高效地存储数据,因为它正在附加数据。

The INSERT command is acceptable for a small number of inserts, but not a good idea for inserting lots of rows. INSERT命令对于少量插入是可以接受的,但对于插入大量行不是一个好主意。 Where possible, insert multiple rows at a time.在可能的情况下,一次插入多行。 It is acceptable to use INSERT... SELECT statements, which can insert bulk data from a different table in one operation.可以使用INSERT... SELECT语句,它可以在一次操作中插入来自不同表的批量数据。

So, the only way to remove Amazon S3 from your operation is to code the data into an INSERT statement, but this is not an optimal way to load the data.因此,从您的操作中删除 Amazon S3 的唯一方法是将数据编码到INSERT语句中,但这不是加载数据的最佳方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将结果集从 Aurora Postgres 数据库导出为 CSV 到 AWS-S3? - How to export result set as CSV from Aurora Postgres DB to AWS-S3? 如何将数据从AWS Postgres RDS传输到S3(然后称为Redshift)? - How to pipe data from AWS Postgres RDS to S3 (then Redshift)? 将数据从 AWS S3 加载到 Aurora Postgres - Load data from AWS S3 to Aurora Postgres 将数据从AWS S3复制到Aurora Postgres - Copy data from AWS S3 to Aurora Postgres AWS:如何将 rds 引擎类型从 Aurora 降级为 Postgres? - AWS: how to downgrade rds engine type from Aurora to Postgres? 如何从Mac连接到AWS Redshift DB - How to connect to aws Redshift db from mac 寻找在 AWS Aurora DB 中运行的 Postgres 数据库之间同步数据的解决方案(可能是 pgsync?) - Looking for a solution to synchronize data between Postgres database running in AWS Aurora DB (pgsync maybe?) 将 Postgres(和 AWS Redshift)数据库链接与 JOOQ 一起使用 - Using Postgres (and AWS Redshift) DB Link with JOOQ 使用 AWS Data Pipeline 在 RDS postgres db 的 CSV 导出中包含列标题? - Including column headers in CSV export from RDS postgres db using AWS Data Pipeline? 如何从aws rds导出postgres数据库 - how to export postgres database from aws rds
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM