简体   繁体   English

如果新分区不存在,如何修改 CTAS 查询以将查询结果附加到表中? - 雅典娜

[英]How to modify CTAS query to append query results to table based on if new partition doesn't exist? - Athena

I have a query that I want to execute daily that's to be partitioned by the date it's executed.我有一个要每天执行的查询,该查询将按执行日期进行分区。 The results of this query should be appended to a the same table.此查询的结果应附加到同一个表中。

My idea was ideally having something similar to the CREATE TABLE IF NOT EXISTS command for adding data by a new partition every day to the existing table if the partition doesn't already exist, but I can't figure out how I'd be able to integrate this in my query.我的想法是理想情况下具有类似于CREATE TABLE IF NOT EXISTS命令的东西,如果分区尚不存在,则每天通过新分区将数据添加到现有表中,但我不知道我如何能够将此集成到我的查询中。

My query:我的查询:

CREATE TABLE IF NOT EXISTS db_name.table_name
WITH (
   external_location = 's3://my-query-results-location/',
   format = 'PARQUET',
   parquet_compression = 'SNAPPY',
   partitioned_by = ARRAY['date_executed'])
AS
SELECT
{columns_that_I_am_selecting_here_including_'date_executed'}

What this does is create a new table for the first day it's executed but nothing happens for subsequent days, I'm assuming because of the CREATE TABLE IF NOT EXISTS validating that the table already exists and not proceeding with the logic.这样做是在执行的第一天创建一个新表,但随后几天没有任何反应,我假设是因为CREATE TABLE IF NOT EXISTS验证该表已经存在并且不继续执行逻辑。

Is there a way to modify my query to create a table for the first day executed and append the results by a new partition for each subsequent day?有没有办法修改我的查询以在执行的第一天创建一个表,并在随后的每一天通过一个新分区附加结果?

I'm quite sure ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION would not apply to my use case here as I'm running a CTAS query.我很确定ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION不适用于我的用例,因为我正在运行 CTAS 查询。

You can simply use INSERT INTO existing_table SELECT... .您可以简单地使用INSERT INTO existing_table SELECT...

Presumably your table is already partitioned, so include that partition column in the SELECT and Amazon Athena will automatically put the data in the correct directory.大概您的表已经分区,因此在 SELECT 中包含该分区列,Amazon Athena 会自动将数据放在正确的目录中。

For example, you might include hte column like this: SELECT ... CURRENT_DATE as date_executed例如,您可能包含这样的 hte 列: SELECT ... CURRENT_DATE as date_executed

See: INSERT INTO - Amazon Athena请参阅: 插入 - 亚马逊雅典娜

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM