如何在 Redshift 中使用 UNLOAD 为文件名添加日期

Question

I found 2 solutions:我找到了2个解决方案：

Using AWS Data Pipeline to schedule the query (Unload) and use 's3://reporting-team-bucket/importfiles/test_123-#{format(@scheduledStartTime,'YYYY-MM-dd-HH')}.csv'使用 AWS Data Pipeline 安排查询（卸载）并使用 's3://reporting-team-bucket/importfiles/test_123-#{format(@scheduledStartTime,'YYYY-MM-dd-HH')}.csv'
writing an MV command to rename the file on the s3 bucket编写一个 MV 命令来重命名 s3 存储桶上的文件

Is there a way to give a file's the current date by only using Redshift, with no other services?有没有办法只使用 Redshift 而不使用其他服务来提供文件的当前日期？

Here is my code so far:到目前为止，这是我的代码：

unload
(
'select * from table'
)
to 's3://bucket/unload_test/test_123_{CurrentDate}.gz'
ACCESS_KEY_ID '12345678910'
SECRET_ACCESS_KEY '10987654321'
GZIP
PARALLEL off;

Just need to get CurrentDate to be 202106 for example.例如，只需将 CurrentDate 设为 202106 即可。

Thanks!谢谢！

Answer 1

我从未尝试在事务中使用UNLOAD ，但如果它有效，您可以使用一个过程。

Answer 2

Redshift unload gives an option to load the data in a by partition. Redshift unload 提供了一个按分区加载数据的选项。 Use **PARTITION BY(COLUMN_NAME)** .使用**PARTITION BY(COLUMN_NAME)** 。 Here is an example这是一个例子

unload (' 

        SELECT   col1
               , col2
               , col3
               , current_date as partition_by_me
         FROM dummy
         
 '
 )
to 's3://mybucket/dummy/'
partition by(partition_by_me)
iam_role 'arn of IAM role'
kms_key_id 'arn of kms key'
encrypted
FORMAT AS PARQUET

In the above example, Added a dummy column partition_by_me as current_date.在上面的示例中，添加了一个虚拟列partition_by_me作为 current_date。 Used that in the unload command partition by(parition_by_me) .在卸载命令partition by(parition_by_me)中使用它。 Data in S3 lands in that specific partition. S3 中的数据位于该特定分区中。

S3 path would be: s3://mybucket/dummy/partition_by_me=2022-08-18/000.parquet Timestamp with zone does work with this. S3 路径将是： s3://mybucket/dummy/partition_by_me=2022-08-18/000.parquet带有区域的时间戳确实适用于此。

*** Dummy column does not get exported to S3 file as an additional column, unless you want to include. *** 虚拟列不会作为附加列导出到 S3 文件，除非您想包含。 Following clause need to used to include in the unloaded data set.以下子句需要用于包含在卸载的数据集中。

partition by(partition_by_me) INCLUDE

INCLUDE clause will include the column in exported data sets. INCLUDE子句将在导出的数据集中包含该列。

如何在 Redshift 中使用 UNLOAD 为文件名添加日期

问题描述

2 个解决方案

解决方案1
0 已采纳 2021-06-16 15:16:14

解决方案2
0 2022-08-18 18:34:18

如何在 Redshift 中使用 UNLOAD 为文件名添加日期

问题描述

2 个解决方案

解决方案1 0 已采纳 2021-06-16 15:16:14

解决方案2 0 2022-08-18 18:34:18

解决方案1
0 已采纳 2021-06-16 15:16:14

解决方案2
0 2022-08-18 18:34:18