简体   繁体   中英

creating partition in external table in hive

I have successfully created and added Dynamic partitions in an Internal table in hive. ie by using following steps:

1-created a source table

2-loaded data from local into source table

3- created another table with partitions - partition_table

4- inserted the data to this table from source table resulting in creation of all the partitions dynamically

My question is, how to perform this in external table? I read so many articles on this, but i am confused , that do I have to specify path to the already existing partitions for creating partitions for external table??

example: Step 1:

create external table1 ( name string, age int, height int)
location 'path/to/dataFile/in/HDFS';

Step 2:

alter table table1 add partition(age) 
location 'path/to/already/existing/partition'

I am not sure how to proceed with partitioning in external tables. Can somebody please help by giving step by step description of the same?.

Thanks in advance!

Yes, you have to tell Hive explicitly what is your partition field.

Consider you have a following HDFS directory on which you want to create a external table.

/path/to/dataFile/

Let's say this directory already have data stored(partitioned) department wise as follows:

/path/to/dataFile/dept1
/path/to/dataFile/dept2
/path/to/dataFile/dept3

Each of these directories have bunch of files where each file contains actual comma separated data for fields say name,age,height.

e.g.
    /path/to/dataFile/dept1/file1.txt
    /path/to/dataFile/dept1/file2.txt

Now let's create external table on this:

Step 1. Create external table:

CREATE EXTERNAL TABLE testdb.table1(name string, age int, height int)
PARTITIONED BY (dept string)
ROW FORMAT DELIMITED
STORED AS TEXTFILE
LOCATION '/path/to/dataFile/';

Step 2. Add partitions:

ALTER TABLE testdb.table1 ADD PARTITION (dept='dept1') LOCATION '/path/to/dataFile/dept1';
ALTER TABLE testdb.table1 ADD PARTITION (dept='dept2') LOCATION '/path/to/dataFile/dept2';
ALTER TABLE testdb.table1 ADD PARTITION (dept='dept3') LOCATION '/path/to/dataFile/dept3';

Done, run select query once to verify if data loaded successfully.

1. Set below property

set hive.exec.dynamic.partition=true

set hive.exec.dynamic.partition.mode=nonstrict

2. Create External partitioned table

create external table1 ( name string, age int, height int) location 'path/to/dataFile/in/HDFS';

3. Insert data to partitioned table from source table.

Basically , the process is same. its just that you create external partitioned table and provide HDFS path to table under which it will create and store partition.

Hope this helps.

The proper way to do it.

  1. Create the table and mention it is partitioned.

    create external table1 ( name string, age int, height int) partitioned by (age int) stored as ****(your format) location 'path/to/dataFile/in/HDFS';

  2. Now you have to refresh the partitions in the hive metastore.

    msck repair table table1

This will take care of loading all your partitions into the hive metastore.

You can use msck repair table at any point during your process to have the metastore updated.

Follow the below steps:

  1. Create a temporary table/Source table

    create table source_table(name string,age int,height int) row format delimited by ',';

    Use your delimiter as in the file instead of ',';

  2. Load data into the source table

    load data local inpath 'path/to/dataFile/in/HDFS';
  3. Create external table with partition

    create external table external_dynamic_partitions(name string,height int) partitioned by (age int) location 'path/to/dataFile/in/HDFS';
  4. Enable dynamic partition mode to nonstrict

    set hive.exec.dynamic.partition.mode=nonstrict
  5. Load data to external table with partitions from source file

    insert into table external_dynamic partition(age) select * from source_table;

That's it. You can check the partitions information using

show partitions external_dynamic;

You can even check if it is an external table or not using

describe formatted external_dynamic;

External table is a type of table in Hive where the data is not moved to the hive warehouse. That means even if U delete the table, the data still persists and you will always get the latest data, which is not the case with Managed table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM