简体   繁体   中英

Hive modify partitioned table data

Problem: One column value is null. It should be 'ab'. Unfortunately I have written '' instead 'ab'.

My table is partitioned table. Is there any way to change that?

I found the following way. But it seems inefficient.

  1. Create a temp table like my table
  2. Use INSERT OVERWRITE. Read data from my old table and write to new table. I am using case statement to change '' to 'ab'
  3. And then change my temp table to original table.

I am looking for a solution something like update partition and msck. Is there any way to do?

One possible solution would be to perform update on the table provided the column is not neither a partitioning nor bucketing column.

UPDATE tablename SET column = (CASE WHEN column = '' THEN 'ab' else column END) [WHERE expr if any];

Update: To support ACID operations on Hive

SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1;

Note: works only if Hive >= 0.14

You can overwrite single partition in this way:

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite target_table partition (part_col)
select 
case when column ='' then 'ab' else column end as column ,
col2,    --select all the columns in the same order
col3,
part_col --partition column is the last one
from target_table where part_col='your_partition_value';

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM