[英]How to add date to a file's name using UNLOAD in Redshift
I found 2 solutions:我找到了2个解决方案:
Is there a way to give a file's the current date by only using Redshift, with no other services?有没有办法只使用 Redshift 而不使用其他服务来提供文件的当前日期?
Here is my code so far:到目前为止,这是我的代码:
unload
(
'select * from table'
)
to 's3://bucket/unload_test/test_123_{CurrentDate}.gz'
ACCESS_KEY_ID '12345678910'
SECRET_ACCESS_KEY '10987654321'
GZIP
PARALLEL off;
Just need to get CurrentDate to be 202106 for example.例如,只需将 CurrentDate 设为 202106 即可。
Thanks!谢谢!
我从未尝试在事务中使用UNLOAD
,但如果它有效,您可以使用一个过程。
Redshift unload gives an option to load the data in a by partition. Redshift unload 提供了一个按分区加载数据的选项。 Use **PARTITION BY(COLUMN_NAME)**
.使用**PARTITION BY(COLUMN_NAME)**
。 Here is an example这是一个例子
unload ('
SELECT col1
, col2
, col3
, current_date as partition_by_me
FROM dummy
'
)
to 's3://mybucket/dummy/'
partition by(partition_by_me)
iam_role 'arn of IAM role'
kms_key_id 'arn of kms key'
encrypted
FORMAT AS PARQUET
In the above example, Added a dummy column partition_by_me
as current_date.在上面的示例中,添加了一个虚拟列partition_by_me
作为 current_date。 Used that in the unload command partition by(parition_by_me)
.在卸载命令partition by(parition_by_me)
中使用它。 Data in S3 lands in that specific partition. S3 中的数据位于该特定分区中。
S3 path would be: s3://mybucket/dummy/partition_by_me=2022-08-18/000.parquet
Timestamp with zone does work with this. S3 路径将是: s3://mybucket/dummy/partition_by_me=2022-08-18/000.parquet
带有区域的时间戳确实适用于此。
*** Dummy column does not get exported to S3 file as an additional column, unless you want to include. *** 虚拟列不会作为附加列导出到 S3 文件,除非您想包含。 Following clause need to used to include in the unloaded data set.以下子句需要用于包含在卸载的数据集中。
partition by(partition_by_me) INCLUDE
INCLUDE
clause will include the column in exported data sets. INCLUDE
子句将在导出的数据集中包含该列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.