[英]Data Loaded wrongly into Hive Partitioned table after adding a new column using ALTER
I already have a Hive partitioned table.我已经有一个 Hive 分区表。 I needed to add a new column to the table, so i used ALTER to add the column like below.
我需要向表中添加一个新列,因此我使用 ALTER 添加如下所示的列。
ALTER TABLE TABLE1 ADD COLUMNS(COLUMN6 STRING);
I have my final table load query like this:我有这样的最终表加载查询:
INSERT OVERWRITE table Final table PARTITION(COLUMN4, COLUMN5)
select
stg.Column1,
stg.Column2,
stg.Column3,
stg.Column4(Partition Column),Field Name:Code Sample value - YAHOO.COM
stg.Column5(Partition Column),Field Name:Date Sample Value - 2021-06-25
stg.Column6(New Column) Field Name:reason sample value - Adjustment
from (
select fee.* from (
select
fees.* ,
ROW_NUMBER() OVER (PARTITION BY fees.Column1 ORDER BY fees.Column3 DESC) as RNK
from Stage table fee
) fee
where RNK = 1
) stg
left join (
select Column1 from Final table
where Column5(date) in (select distinct column5(date) from Stage table)
) TGT
on tgt.Column1(id) = stg.Column1(id) where tgt.column1 is null
UNION
select
tgt.column1(id),
tgt.column2,
tgt.column3,
tgt.column4(partiton column),
tgt.column5(partiton column-date),
tgt.column6(New column)
from
Final Table TGT
WHERE TGT.Column5(date) in (select distinct column5(date) from Stage table);"
Now when my job ran today, and when i try to query the final table, i get the below error现在,当我今天的工作运行时,当我尝试查询最终表时,出现以下错误
Invalid partition value 'Adjustment' for DATE partition key: Code=2021-06-25/date=Adjustment
I can figure out something wrong happend around the partition column but unable to figure out what went wrong..Can someone help?我可以找出分区列周围发生的问题,但无法弄清楚出了什么问题..有人可以帮忙吗?
Partition columns should be the last ones in the select.分区列应该是选择中的最后一个。 When you add new column it is being added as the last non-partition column, partition columns remain the last ones, they are not stored in the datafiles, only metadata contains information about partitions.
当您添加新列时,它被添加为最后一个非分区列,分区列仍然是最后一个,它们不存储在数据文件中,只有元数据包含有关分区的信息。 All other columns order also matters, it should match table DDL, check it using
DESCRIBE FORMATTED table_name
.所有其他列的顺序也很重要,它应该匹配表 DDL,使用
DESCRIBE FORMATTED table_name
检查它。
INSERT OVERWRITE table Final table PARTITION(COLUMN4, COLUMN5)
select
stg.Column1,
stg.Column2,
stg.Column3,
stg.Column6 (New column) ------------New column
stg.Column4(Partition Column) ---partition columns
stg.Column5(Partition Column)
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.