简体   繁体   中英

how to add partition to hive table with nested data?

I am loading my logs from S3 into Hive with

 CREATE TABLE logs(
`col1` struct<`country`:string,`page`:string,`date`:string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3a://application-logs/sample/' ; 

My data looks like this

{
  "col1": {
    "country": "India",
    "page": "/signup",
    "date": "2018-01-01"
  }
}

If I want to create a partition on col1.country, col1.page, col1.date how should I include that in create a statement, I tried like colName.fieldName, but with no success.

You can try directly without mentioning the column name, something like below

 CREATE TABLE logs(
`col1` struct<`country`:string,`page`:string,`date`:string>
)
partitioned by (country string, page string, date string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3a://application-logs/sample/' ; 

Please note that external tables will not detect the partitions directly, you have to alter and add the partitions, something like below :

ALTER TABLE logs ADD PARTITION (country=india, pager=whatever, date=whatever) location '/hdfs/path/';

#You might also need to repair the table at the end

msck repair table schemaName.tableName

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM