简体   繁体   中英

Athena gzip compression query result has hybrid compressed-decompressed

I'm setting AWS Athena with s3 bucket which has gzipped csv files.

And then query like this

SELECT * FROM "sample_db"."sample_table2" limit 100;

results is different take 1 and 2.

it seems like to mix compression / decompression results.

Is there any way getting result only decompressed result on Athena?

file contents is below:

"title","user_info.client_user_id","user_info.player_id"
"test : csv take 4",,
"title","user_info.client_user_id","user_info.player_id"
"test : csv take 4",,
"title","user_info.client_user_id","user_info.player_id"
"test : csv take 4",,
"title","user_info.client_user_id","user_info.player_id"
"test : csv take 4",,

s3 has only one file test-sample.gz

Query Take 1 在此处输入图像描述

Query Take 2 在此处输入图像描述

Cause is wrong format query, partitioning for csv and corrupted data.

It is working on directly s3 gz upload in directories.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM