簡體   English   中英

如何將一個分區的數據插入/復制到配置單元中的多個分區?

[英]How to insert/copy one partition's data to multiple partitions in hive?

我在hive表中有day='2019-01-01'數據,我想將相同的數據復制到整個2019年1月。 (即'2019-01-02''2019-01-03' ...... '2019-01-31'

我正在嘗試關注,但數據僅插入'2019-01-02'而不是'2019-01-03'。

INSERT OVERWRITE TABLE db_t.students PARTITION(dt='2019-01-02', dt='2019-01-03')
SELECT id, name, marks FROM db_t.students WHERE dt='2019-01-01';

將所有數據與所需日期范圍的日歷日期交叉連接。 使用動態分區:

set hivevar:start_date=2019-01-02; 
set hivevar:end_date=2019-01-31; 

set hive.exec.dynamic.partition=true; 
set hive.exec.dynamic.partition.mode=nonstrict;  

with date_range as 
(--this query generates date range
select date_add ('${hivevar:start_date}',s.i) as dt 
  from ( select posexplode(split(space(datediff('${hivevar:end_date}','${hivevar:start_date}')),' ')) as (i,x) ) s
)

INSERT OVERWRITE TABLE db_t.students PARTITION(dt)
SELECT id, name, marks, r.dt --partition column is the last one
  FROM db_t.students s 
       CROSS JOIN date_range r
 WHERE s.dt='2019-01-01'
DISTRIBUTE BY r.dt;

hadoop distcp一種可能的解決方案是使用hadoop fs -cphadoop distcp復制分區數據(對每個分區重復或在shell中使用循環):

hadoop fs -cp '/usr/warehouse/students/dt=2019-01-01' '/usr/warehouse/students/dt=2019-01-02'

還有一個使用UNION ALL的解決方案:

    set hive.exec.dynamic.partition=true; 
    set hive.exec.dynamic.partition.mode=nonstrict;      

    INSERT OVERWRITE TABLE db_t.students PARTITION(dt)
    SELECT id, name, marks, '2019-01-02' as dt FROM db_t.students s WHERE s.dt='2019-01-01'
    UNION ALL
     SELECT id, name, marks, '2019-01-03' as dt FROM db_t.students s WHERE s.dt='2019-01-01'
    UNION ALL
     SELECT id, name, marks, '2019-01-04' as dt FROM db_t.students s WHERE s.dt='2019-01-01' 
    UNION ALL
    ... 
  ;

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM