简体   繁体   中英

Loading data from one Hive table to another with partition

I have data in one Hive table and would like to load data into another hive table.

The source table is reg_logs which has 2 partitions, date and hour. The data gets loaded into this table hourly. The schema is:

CREATE EXTERNAL TABLE IF NOT EXISTS reg_logs (
id int,
region_code int,
count int
)
PARTITIONED BY (utc_date STRING, utc_hour STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION '/ad_data/raw/reg_logs';

The destination table is reg_logs_org all I would like to do is copy all data from reg_logs beside utc_hour column.

Schema I created is: (please correct if I am wrong)

CREATE EXTERNAL TABLE IF NOT EXISTS reg_logs_org (
id int,
region_code int,
count int
)
PARTITIONED BY (utc_date STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION '/ad_data/reg_logs_org';

Insert data into reg_logs_org from reg_logs:

insert overwrite table reg_logs_org
select id, region_code, sum(count), utc_date
from 
reg_logs
group by 
utc_date, id, region_code

error message:

 FAILED: SemanticException 1:23 Need to specify partition columns because the destination table is partitioned. Error encountered near token 'reg_logs_org'

==

Thank you,
Rio

Create a copy of your table

CREATE TABLE my_table_backup LIKE my_table;

Enable dynamic partitioning

SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
SET hive.mapred.mode = nonstrict;

Copy the table

INSERT OVERWRITE TABLE my_table_backup PARTITION (ds)
SELECT * FROM my_table 
WHERE ds = ds;

The where clause is needed if you use strict mode.

this is because you are missing the partition info in your insert query

  insert overwrite table reg_logs_org PARTITION (utc_date)
  select id, region_code, sum(count), utc_date
  from 
  reg_logs
  group by 
  utc_date, id, region_code

In some cases you may need to set hive.exec.dynamic.partition.mode= nonstrict to be able to insert data into partitioned table, for example,

CREATE TABLE hivePartitionedTable
(
          c1    int
        , c2    int
        , c3    string
)
PARTITIONED BY (year  int)
ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
STORED AS SEQUENCEFILE
;

then this INSERT will work:

set hive.exec.dynamic.partition.mode=nonstrict;

INSERT INTO hivePartitionedTable PARTITION (year)
VALUES (1,2,'3', 1999);

It is mandatory to use partitioned column as last column while inserting the data. Hive will take the data which is there in last column.

So based on that insert query should be:-

  insert overwrite table reg_logs_org PARTITION (utc_date)
  select id, region_code, sum(count), utc_date
  from 
  reg_logs
  group by 
  utc_date, id, region_code

From the documentation :

The dynamic partition columns must be specified last among the columns in the SELECT statement and in the same order in which they appear in the PARTITION() clause

This will not work if the first partition is in the source table is empty , I mean there are no records in the first partition of source table. In this case I suggest to insert a dummy record along with the first partition in a separate insert script and truncate that data later.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM