简体   繁体   中英

How to modify CTAS query to append query results to table based on if new partition doesn't exist? - Athena

I have a query that I want to execute daily that's to be partitioned by the date it's executed. The results of this query should be appended to a the same table.

My idea was ideally having something similar to the CREATE TABLE IF NOT EXISTS command for adding data by a new partition every day to the existing table if the partition doesn't already exist, but I can't figure out how I'd be able to integrate this in my query.

My query:

CREATE TABLE IF NOT EXISTS db_name.table_name
WITH (
   external_location = 's3://my-query-results-location/',
   format = 'PARQUET',
   parquet_compression = 'SNAPPY',
   partitioned_by = ARRAY['date_executed'])
AS
SELECT
{columns_that_I_am_selecting_here_including_'date_executed'}

What this does is create a new table for the first day it's executed but nothing happens for subsequent days, I'm assuming because of the CREATE TABLE IF NOT EXISTS validating that the table already exists and not proceeding with the logic.

Is there a way to modify my query to create a table for the first day executed and append the results by a new partition for each subsequent day?

I'm quite sure ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION would not apply to my use case here as I'm running a CTAS query.

You can simply use INSERT INTO existing_table SELECT... .

Presumably your table is already partitioned, so include that partition column in the SELECT and Amazon Athena will automatically put the data in the correct directory.

For example, you might include hte column like this: SELECT ... CURRENT_DATE as date_executed

See: INSERT INTO - Amazon Athena

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM