简体   繁体   English

使用 ALTER 添加新列后,数据错误地加载到 Hive 分区表中

[英]Data Loaded wrongly into Hive Partitioned table after adding a new column using ALTER

I already have a Hive partitioned table.我已经有一个 Hive 分区表。 I needed to add a new column to the table, so i used ALTER to add the column like below.我需要向表中添加一个新列,因此我使用 ALTER 添加如下所示的列。

ALTER TABLE TABLE1 ADD COLUMNS(COLUMN6 STRING);

I have my final table load query like this:我有这样的最终表加载查询:

INSERT OVERWRITE table Final table  PARTITION(COLUMN4, COLUMN5)
select
stg.Column1,
stg.Column2,
stg.Column3,
stg.Column4(Partition Column),Field Name:Code Sample value - YAHOO.COM
stg.Column5(Partition Column),Field Name:Date Sample Value - 2021-06-25
stg.Column6(New Column)       Field Name:reason     sample value - Adjustment
from (
         select fee.* from (
             select 
               fees.* , 
               ROW_NUMBER() OVER (PARTITION BY fees.Column1 ORDER BY fees.Column3 DESC) as RNK
             from Stage table fee
         ) fee
         where RNK = 1
     ) stg
     left join (
         select Column1 from Final table
         where Column5(date) in (select distinct column5(date) from Stage table)
     ) TGT
     on tgt.Column1(id) = stg.Column1(id) where tgt.column1 is null 
UNION
select 
tgt.column1(id),
tgt.column2,
tgt.column3,
tgt.column4(partiton column),
tgt.column5(partiton column-date),
tgt.column6(New column)
from 
Final Table TGT
      WHERE TGT.Column5(date) in (select distinct column5(date) from Stage table);"

Now when my job ran today, and when i try to query the final table, i get the below error现在,当我今天的工作运行时,当我尝试查询最终表时,出现以下错误

Invalid partition value 'Adjustment' for DATE partition key: Code=2021-06-25/date=Adjustment

I can figure out something wrong happend around the partition column but unable to figure out what went wrong..Can someone help?我可以找出分区列周围发生的问题,但无法弄清楚出了什么问题..有人可以帮忙吗?

Partition columns should be the last ones in the select.分区列应该是选择中的最后一个。 When you add new column it is being added as the last non-partition column, partition columns remain the last ones, they are not stored in the datafiles, only metadata contains information about partitions.当您添加新列时,它被添加为最后一个非分区列,分区列仍然是最后一个,它们不存储在数据文件中,只有元数据包含有关分区的信息。 All other columns order also matters, it should match table DDL, check it using DESCRIBE FORMATTED table_name .所有其他列的顺序也很重要,它应该匹配表 DDL,使用DESCRIBE FORMATTED table_name检查它。

INSERT OVERWRITE table Final table  PARTITION(COLUMN4, COLUMN5)
select
stg.Column1,
stg.Column2,
stg.Column3,
stg.Column6 (New column) ------------New column
stg.Column4(Partition Column)  ---partition columns
stg.Column5(Partition Column)
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在HIVE中,分区列不属于基础保存数据的一部分吗? - In HIVE, partitioned column is not part of the underlying saved data? 通过添加值取决于另一列的新列来更改表 - Alter a Table by adding a new column whose value depends on another column 更改表中的语法错误,导致ms访问在列之后添加列 - syntax error in alter table for ms access for adding column after a column 在IMPLA / HIVE中使用SELECT添加新列后,旧表数据变为NULL - Old tables data becomes NULL after adding a new column with a SELECT in IMPLA/HIVE 将数据从一个分区表复制到另一个新的分区表 - copy data from one partitioned table to another new partitioned table 如何从按日期列分区的 hive 表中获取最新日期? - How to fetch latest date from a hive table partitioned on date column? Hive无法在HBase中为外部表创建分区列 - Hive can't create partitioned column for external table in hbase sqlite:alter table之后没有列 - sqlite: no column after alter table Hive:如何将数据从分区表插入分区表? - Hive: How do I INSERT data FROM a PARTITIONED table INTO a PARTITIONED table? 进入 Hive 表 - 非分区表到具有多个分区的分区表 - 由于列号/类型,无法插入目标表 - into Hive table - Non Partitioned table to Partitioned table having multiple partitions - Cannot insert into target table because column number/types
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM