簡體   English   中英

未添加Apache Hive MSCK REPAIR TABLE新分區

[英]Apache hive MSCK REPAIR TABLE new partition not added

我是Apache Hive的新手。 在外部表分區上工作時,如果我將新分區直接添加到HDFS,則在運行MSCK REPAIR表后不會添加新分區。 以下是我嘗試過的代碼,

-創建外部表

hive> create external table factory(name string, empid int, age int) partitioned by(region string)  
    > row format delimited fields terminated by ','; 

-詳細表信息

Location:  hdfs://localhost.localdomain:8020/user/hive/warehouse/factory     
Table Type:             EXTERNAL_TABLE           
Table Parameters:        
    EXTERNAL                TRUE                
    transient_lastDdlTime   1438579844  

-在HDFS中創建目錄以加載表工廠的數據

[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1'
[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'

-表數據

cat factory1.txt
emp1,500,40
emp2,501,45
emp3,502,50

cat factory2.txt
EMP10,200,25
EMP11,201,27
EMP12,202,30

-從本地復制到HDFS

[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory1.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1'
[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory2.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'

-更改表以在metastore中更新

hive> alter table factory add partition(region='southregion') location '/user/hive/testing/testing1/factory2';
hive> alter table factory add partition(region='northregion') location '/user/hive/testing/testing1/factory1';            
hive> select * from factory;                                                                      
OK
emp1    500 40  northregion
emp2    501 45  northregion
emp3    502 50  northregion
EMP10   200 25  southregion
EMP11   201 27  southregion
EMP12   202 30  southregion

現在,我創建了新文件factory3.txt以添加為表工廠的新分區

cat factory3.txt
user1,100,25
user2,101,27
user3,102,30

-創建路徑並復制表數據

[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'
[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory3.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory3'

現在,我執行以下查詢來更新添加了新分區的元存儲

MSCK REPAIR TABLE factory;

現在,該表未提供factory3文件的新分區內容。 在為表工廠添加分區時,我可以知道我在哪里做錯了嗎?

相反,如果我運行alter命令,它將顯示新的分區數據。

hive> alter table factory add partition(region='eastregion') location '/user/hive/testing/testing1/factory3';

我能知道為什么MSCK REPAIR TABLE命令不起作用嗎?

為了使MSCK正常工作,應使用命名約定/partition_name=partition_value/

您必須將數據放在表位置目錄中名為“ region = eastregio”的目錄中:

$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/warehouse/factory/region=eastregio'
$ hadoop fs -copyFromLocal '/home/cloudera/factory3.txt' 'hdfs://localhost.localdomain:8020/user/hive/warehouse/factory/region=eastregio'

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM