简体   繁体   English

使用gzip压缩卸载时如何卸载csv文件类型?

[英]How to unload csv file type when unload is compressed with gzip?

Hi I have this query that will Unload data from redshift to S3 that will output as csv file and compressed with gzip.您好,我有这个查询将数据从 redshift 卸载到 S3,将 output 作为 csv 文件并使用 gzip 压缩。 Supposedly, if I extract the gzip it will give me the csv file but instead it extract as "file".据推测,如果我提取 gzip,它将给我 csv 文件,但它提取为“文件”。

The image attached is the output of partitioned year which is 2018. I was expecting that the unzip file would be on csv format since I specified it in the query but instead it gives me a "file" as file type.所附图像是分区年份的 output,即 2018 年。我原以为解压缩文件将采用 csv 格式,因为我在查询中指定了它,但它却给了我一个“文件”作为文件类型。 gzip压缩包

Query:询问:

UNLOAD ($$ SELECT *, (date_part("year", last_updated))::text as year FROM table WHERE date_part("year", last_updated) <= (date_part("year", CURRENT_DATE)-1) $$)
TO 's3://'
IAM_ROLE  ''
PARTITION BY (year) 
CSV DELIMITER AS  '|'
GZIP
PARALLEL FALSE
ALLOWOVERWRITE
MAXFILESIZE AS 100 MB;

A little more on what you are getting would be helpful but I think I see the issue.多了解一下你得到的东西会有所帮助,但我想我看到了这个问题。 You specified a partition column which will split the out output files by this value (multiple files, one per year) but did add the INCLUDE option to partition that will tell UNLOAD to keep the partition values in the output files as well.您指定了一个分区列,它将按此值(多个文件,每年一个)拆分出 output 个文件,但确实向分区添加了 INCLUDE 选项,这将告诉 UNLOAD 将分区值也保留在 output 个文件中。 Since you have only one column, year and this is being used for the partitioned file names you get empty files.由于您只有一列 year 并且它被用于分区文件名,因此您会得到空文件。

Without more info it will be hard to do better than this interpretation of you commands.如果没有更多信息,很难比对您的命令的解释做得更好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 扩展名为 CSV 的 Redshift 卸载命令 - Redshift Unload command with CSV extension Azure 带健康检查的应用服务卸载时间 - Azure App Service unload time with health check 使用 Snowpark python 将雪花数据卸载到 S3。 如何提供存储集成选项 - Use Snowpark python to unload snowflake data to S3. How to provide storage integration option 以 parquet 格式将 Redshift 数据卸载到 S3 - Unload Redshift data to S3 in parquet format 以特定分区格式将数据卸载到 redshift - Unload data into redshift in a specific partition format AWS Keyspace DSBulk 卸载失败,“令牌元数据不存在” - AWS Keyspace DSBulk unload failed, "Token metadata not present" 循环遍历压缩的 gzip 文件在第二次迭代时抛出“错误 [Errno 2] 没有这样的文件或目录:'part-r-00001.gz'” - Python - Loop through compressed gzip files throws "ERROR [Errno 2] No such file or directory: 'part-r-00001.gz'" at second iteration - Python 允许 Snowflake 将数据卸载到 AWS S3 存储桶中 - Allowing Snowflake to unload data into AWS S3 bucket 雅典娜 gzip 压缩查询结果具有混合压缩解压缩 - Athena gzip compression query result has hybrid compressed-decompressed 如何在没有 memory 分配的情况下压缩和发布文件 - How to gzip and post a file without memory allocation
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM