简体   繁体   English

Amazon Athena:如何在查询跳过列标题后存储结果?

[英]Amazon Athena : How to store results after querying with skipping column headers?

I ran a simple query using Athena dashboard on data of format csv.The result was a csv with column headers. 我在格式csv的数据上使用Athena仪表板运行了一个简单的查询。结果是带有列标题的csv。 When storing the results,Athena stores with the column headers in s3.How can i skip storing header column names,as i have to make new table from the results and it is repetitive 存储结果时,Athena在s3中存储列标题。如何跳过存储标题列名称,因为我必须从结果中创建新表并且它是重复的

From an Eric Hammond post on AWS Forums : 来自AWS论坛上的Eric Hammond帖子

...
  WHERE
    date NOT LIKE '#%'
...

I found this works! 我发现这个有效! The steps I took: 我采取的步骤:

However, subsequent queries store even more data in that S3 directory, so it confuses any subsequent executions. 但是,后续查询会在该S3目录中存储更多数据,因此会混淆任何后续执行。

Try "skip.header.line.count"="1", This feature has been available on AWS Athena since 2018-01-19, here's a sample: 尝试“skip.header.line.count”=“1”,此功能自2018-01-19以来已在AWS Athena上提供,以下是一个示例:

CREATE EXTERNAL TABLE IF NOT EXISTS tableName (
  `field1` string,
  `field2` string,
  `field3` string 
)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
 WITH SERDEPROPERTIES (
   'separatorChar' = ',',
   'quoteChar' = '\"',
   'escapeChar' = '\\'
   )
LOCATION 's3://fileLocation/'
TBLPROPERTIES ('skip.header.line.count'='1')

You can refer to this question: Aws Athena - Create external table skipping first row 你可以参考这个问题: Aws Athena - 创建跳过第一行的外部表

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM