简体   繁体   English

AWS Athena CTAS 查询失败,建议清空空桶

[英]AWS Athena CTAS query failing, suggests emptying empty bucket

I am running a "CREATE TABLE AS SELECT (CTAS) query" ( https://docs.aws.amazon.com/athena/latest/ug/ctas.html ), query copied at bottom.我正在运行“CREATE TABLE AS SELECT (CTAS) 查询”( https://docs.aws.amazon.com/athena/latest/ug/ctas.ZFC35FDC70D5FC69D25369883A8复制在底部C) I am getting the following error message:我收到以下错误消息:

HIVE_PATH_ALREADY_EXISTS: Target directory for table 'default.openaq_processed' already exists:
 s3://<processed-data-bucketname>/. You may need to manually clean the data at location 
's3://<athena-query-results-bucketname>/Unsaved/2021/04/29/tables/82025a35-8867-4865-8f42-f40adb6bee4c' 
before retrying. Athena will not delete data in your account.

This query ran against the "default" database, unless qualified by the query. Please post the
 error message on our forum or contact customer support with Query Id: 82025a35-8867-4865-8f42-f40adb6bee4c.

The AWS knowledge center page on this error ( https://aws.amazon.com/premiumsupport/knowledge-center/athena-hive-path-already-exists/ ), like the error message above, suggests that the fix is to ensure the location used to store the query results must be empty.关于此错误的 AWS 知识中心页面 ( https://aws.amazon.com/premiumsupport/knowledge-center/athena-hive-path-already-exists/ ) 与上面的错误消息一样,表明修复是为了确保用于存储查询结果的位置必须为空。

But it already is.但它已经是。 In fact there's no tables/ prefix/folder in s3://<athena-query-results-bucketname>/Unsaved/2021/04/29/ , and the s3://<processed-data-bucketname>/ bucket is totally empty.实际上s3://<athena-query-results-bucketname>/Unsaved/2021/04/29/中没有tables/前缀/文件夹,而s3://<processed-data-bucketname>/存储桶完全空的。

I've posted the question on the AWS forum but no responses, any suggestions on how I might get this CTAS query to succeed would be much appreciated.我已经在 AWS 论坛上发布了这个问题,但没有回复,任何关于如何让这个 CTAS 查询成功的建议都将不胜感激。

UPDATE: The query that throws the error:更新:引发错误的查询:

CREATE TABLE openaq_processed
WITH (format='PARQUET', 
parquet_compression='SNAPPY', 
partitioned_by=array['country', 'parameter'], 
external_location = '<processed-data-bucketname>') 
AS
SELECT date_utc as date_utc_str,
date_local as date_local_str,
CAST(from_iso8601_timestamp(date_utc) as timestamp) as timestamp_utc,
CAST(from_iso8601_timestamp(date_local) as timestamp) as timestamp_local,
"location",  -- location is a reserved word for Athena, needs quotes
value,
unit,
city,
attribution,
averagingperiod,
coordinates."latitude" as latitude,
coordinates."longitude" as longitude,
sourcename,
sourcetype,
mobile,
country,
parameter
FROM openaq_pq2_tables

So I sprung for AWS Developer Support and asked this question.因此,我寻求 AWS 开发人员支持并提出了这个问题。 The response I got, which indeed fixed the error, was to create a folder within my external_location bucket.我得到的确实解决了错误的响应是在我的external_location存储桶中创建一个文件夹。 Not sure why this is necessary but apparently it is.不知道为什么这是必要的,但显然它是必要的。

So, from shell: $ aws s3 mb s3://<processed-data-bucketname>/processed_data/因此,从 shell: $ aws s3 mb s3://<processed-data-bucketname>/processed_data/

( mb above stands for "make bucket"). (上面的mb代表“制作桶”)。

Then updating external_location = 's3://<processed-data-bucketname>' in the query above to external_location = 's3://<processed-data-bucketname>/processed_data/')然后将上面查询中的external_location = 's3://<processed-data-bucketname>'更新为external_location = 's3://<processed-data-bucketname>/processed_data/')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM