简体繁体 English

如何将数据从 AWS Aurora Postgres DB 导出到 Redshift？

[英]How to export data from AWS Aurora Postgres DB to Redshift?

原文 2019-11-11 08:08:57 2 1 postgresql/ amazon-web-services/ amazon-redshift/ psycopg2/ amazon-aurora

I have a Postgres DB hosted on AWS Aurora from which I need to retrieve data and insert it into Redshift .我有一个托管在AWS Aurora上的Postgres DB ，我需要从中检索数据并将其插入Redshift 。

My current approach is as follows:我目前的做法如下：

Create an Aurora DB connection using Psycopg2 .使用Psycopg2创建Aurora 数据库连接。
With Aurora connection created above, query the Aurora DB table and export the resultset as a CSV file to S3 using OUTFILE使用上面创建的Aurora连接，查询Aurora DB表并使用OUTFILE将结果集作为CSV文件导出到S3
From S3 , Redshift connection using Psycopg2 .从S3开始，使用Psycopg2进行Redshift连接。

I'm trying to optimize this by removing the S3 service and connecting Aurora to Redshift directly.我试图通过删除S3服务并将Aurora直接连接到Redshift来优化这一点。

Here's what I want to do for which I couldn't find resources:这是我找不到资源的我想做的事情：

Query the Aurora table - table1 and directly export the result set into the Redshift table - table1.查询Aurora表-table1，直接将结果集导出到Redshift表-table1中。

I'm not even sure if this is possible with the current system.我什至不确定当前系统是否可以做到这一点。 Any thoughts?有什么想法吗？

1 个解决方案

There are two ways to get data into an Amazon Redshift database:有两种方法可以将数据导入 Amazon Redshift 数据库：

COPY command to load from Amazon S3从 Amazon S3 加载的COPY命令
INSERT statement to insert data provided as part of the SQL statement INSERT语句插入作为 SQL 语句的一部分提供的数据

The COPY method is recommended for normal data loading. COPY方法推荐用于正常的数据加载。 It runs in parallel across slices and stores the data as efficiently as possible given that it is appending data.它跨切片并行运行，并尽可能高效地存储数据，因为它正在附加数据。

The INSERT command is acceptable for a small number of inserts, but not a good idea for inserting lots of rows. INSERT命令对于少量插入是可以接受的，但对于插入大量行不是一个好主意。 Where possible, insert multiple rows at a time.在可能的情况下，一次插入多行。 It is acceptable to use INSERT... SELECT statements, which can insert bulk data from a different table in one operation.可以使用INSERT... SELECT语句，它可以在一次操作中插入来自不同表的批量数据。

So, the only way to remove Amazon S3 from your operation is to code the data into an INSERT statement, but this is not an optimal way to load the data.因此，从您的操作中删除 Amazon S3 的唯一方法是将数据编码到INSERT语句中，但这不是加载数据的最佳方法。