简体   繁体   中英

How do I partition a table by all values?

I have an external table, now I want to add partitions to it. I have 224 unique city id's and I want to just write alter table my_table add partition (cityid) location /path; but hive complains, saying that I don't provide anything for the city id value, it should be eg alter table my_table add partition (cityid=VALUE) location /path; , but I don't want to run alter table commands for every value of city id, how can I do it for all id's in one go?

This is what hive command line looks like:

hive> alter table pavel.browserdata add partition (cityid) location '/user/maria_dev/data/cityidPartition';                                                                                                                                                                                                           

FAILED: ValidationFailureSemanticException table is not partitioned but partition spec exists: {cityid=null}

Partition on physical level is a location (separate location for each value, usually looks like key=value ) with data files. If you already have partitions directory structure with files, all you need is to create partitions in Hive metastore, then you can point your table to the root directory using ALTER TABLE SET LOCATION , then use MSCK REPAIR TABLE command. The equivalent command on Amazon Elastic MapReduce (EMR)'s version of Hive is: ALTER TABLE table_name RECOVER PARTITIONS . This will add Hive partitions metadata. See manual here: RECOVER PARTITIONS

If you have only not-partitioned table with data in it's location, then adding partitions will not work because the data needs to be reloaded, you need to:

Create another partitioned table and use insert overwrite to load partition data using dynamic partition load:

set hive.exec.dynamic.partition=true;   
set hive.exec.dynamic.partition.mode=nonstrict; 

insert overwrite table2 partition(cityid) 
select col1, ... colN,
       cityid    
  from table1; --partitions columns should be last in the select

This is quite efficient way to reorganize your data.

After this you can delete source table and rename your target table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM