简体   繁体   English

如何在 Redshift 中使用 UNLOAD 为文件名添加日期

[英]How to add date to a file's name using UNLOAD in Redshift

I found 2 solutions:我找到了2个解决方案:

  1. Using AWS Data Pipeline to schedule the query (Unload) and use 's3://reporting-team-bucket/importfiles/test_123-#{format(@scheduledStartTime,'YYYY-MM-dd-HH')}.csv'使用 AWS Data Pipeline 安排查询(卸载)并使用 's3://reporting-team-bucket/importfiles/test_123-#{format(@scheduledStartTime,'YYYY-MM-dd-HH')}.csv'
  2. writing an MV command to rename the file on the s3 bucket编写一个 MV 命令来重命名 s3 存储桶上的文件

Is there a way to give a file's the current date by only using Redshift, with no other services?有没有办法使用 Redshift 而不使用其他服务来提供文件的当前日期?

Here is my code so far:到目前为止,这是我的代码:

unload
(
'select * from table'
)
to 's3://bucket/unload_test/test_123_{CurrentDate}.gz'
ACCESS_KEY_ID '12345678910'
SECRET_ACCESS_KEY '10987654321'
GZIP
PARALLEL off; 

Just need to get CurrentDate to be 202106 for example.例如,只需将 CurrentDate 设为 202106 即可。

Thanks!谢谢!

我从未尝试在事务中使用UNLOAD ,但如果它有效,您可以使用一个过程。

Redshift unload gives an option to load the data in a by partition. Redshift unload 提供了一个按分区加载数据的选项。 Use **PARTITION BY(COLUMN_NAME)** .使用**PARTITION BY(COLUMN_NAME)** Here is an example这是一个例子

unload (' 

        SELECT   col1
               , col2
               , col3
               , current_date as partition_by_me
         FROM dummy
         
 '
 )
to 's3://mybucket/dummy/'
partition by(partition_by_me)
iam_role 'arn of IAM role'
kms_key_id 'arn of kms key'
encrypted
FORMAT AS PARQUET

In the above example, Added a dummy column partition_by_me as current_date.在上面的示例中,添加了一个虚拟列partition_by_me作为 current_date。 Used that in the unload command partition by(parition_by_me) .在卸载命令partition by(parition_by_me)中使用它。 Data in S3 lands in that specific partition. S3 中的数据位于该特定分区中。

S3 path would be: s3://mybucket/dummy/partition_by_me=2022-08-18/000.parquet Timestamp with zone does work with this. S3 路径将是: s3://mybucket/dummy/partition_by_me=2022-08-18/000.parquet带有区域的时间戳确实适用于此。

*** Dummy column does not get exported to S3 file as an additional column, unless you want to include. *** 虚拟列不会作为附加列导出到 S3 文件,除非您想包含。 Following clause need to used to include in the unloaded data set.以下子句需要用于包含在卸载的数据集中。

partition by(partition_by_me) INCLUDE

INCLUDE clause will include the column in exported data sets. INCLUDE子句将在导出的数据集中包含该列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM