简体   繁体   中英

AWS Athena: partition by multiple columns in the same path

I am trying to create a table in Athena based on a directory in S3 that looks something like this:

folders/
  id=1/
    folder1/
    folder2/
    folder3/
      dt=***/
      dt=***/
  id=2/
...

I want to partition by two columns. One is the id , and on is the dt .

So eventually I want my table to have an id column, and for each id , all of the dt 's in its sub-folder folder3 . Is there any solution for this that doesn't force me to have a path like this: ...\id=\dt= ?

I tried to simply set these two columns in the "partition by" section where the location is the "folders" path, then the table has no data.

I then tried using injection and setting a specific id in a where clause when querying the table, but then the table contains data I don't need, and seems the partition doesn't work as I expected.

Table DDL:

CREATE EXTERNAL TABLE IF NOT EXISTS `database`.`test_table` (
  `col1` string,
  `col2` string,
) PARTITIONED BY (
  id string,
  dt string
) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
WITH SERDEPROPERTIES (
  'serialization.format' = ',',
  'field.delim' = ','
) LOCATION 's3://folders/'

Appreciate any help!

You can "manually" add the partitions using something like

alter table your_table add if not exists
partition (id=1, dt=0)
location '/id=1/folder3/dt=0/'
partition (id=1, dt=1)
location 'id=1/folder3/dt=1'
...

you can programmatically add all your partitions on s3 this way using the aws cli to list all folders, loop over them and add them to the partition table using a query like the above (see the docs ).

An alternative is to use partition projection with custom storage locations, which has the benefit of giving you faster queries and removes the need for manually adding new partitions when new data arrives to S3 (see the partition projection docs , specially the section on custom S3 locations).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM