简体   繁体   English

更改 hive 中外部分区 parquet 表的列名,而没有空/丢失数据

[英]Change column name of an external partitioned parquet table in hive without null/lost data

I have the following table:我有下表:

CREATE EXTERNAL TABLE aggregate_status(
m_point VARCHAR(50),
territory VARCHAR(50),
reading_meter VARCHAR(50),
meter_type VARCHAR(500)
)
PARTITIONED BY(
insert_date VARCHAR(10))
STORED AS PARQUET
LOCATION '<the s3 route>/aggregate_status'
TBLPROPERTIES(
  'parquet.compression'='SNAPPY'
)

I wish to change the reading_meter column to reading_mode , without losing data.我希望将reading_meter列更改为reading_mode而不会丢失数据。

ALTER TABLE works, but the field now shows null . ALTER TABLE有效,但该字段现在显示null

I'm not the owner of the Hadoop enviroment I'm working on so changing properties such as set parquet.column.index.access = true is discarded.我不是我正在处理的 Hadoop 环境的所有者,因此会丢弃诸如set parquet.column.index.access = true之类的更改属性。

Any help would be appreciated.任何帮助,将不胜感激。 Thanks.谢谢。

Managed to find a solution, at least for short amounts of data.设法找到解决方案,至少对于少量数据。

  1. Create a backup of the table, with the column name already changed.创建表的备份,列名已更改。
CREATE TABLE aggregate_status_bkp AS
SELECT 
m_point, 
territory,
reading_meter AS reading_mode,
meter_type,
insert_date
FROM aggregate_status
  1. Perform the ALTER TABLE执行 ALTER TABLE
ALTER TABLE aggregate_status CHANGE COLUMN reading_meter reading_mode VARCHAR (50)
  1. INSERT OVERWRITE from the backup to the original. INSERT OVERWRITE 从备份到原始。
--You might need to temporarily disable strict partition mode depending on your case, this is safe since it's only a lock.

--set hive.exec.dynamic.partition.mode=nonstrict;

INSERT OVERWRITE TABLE aggregate_status PARTITION(insert_date)
SELECT
m_point,
territory,
reading_mode,
meter_type,
insert_date
FROM aggregate_status_bkp;

--set hive.exec.dynamic.partition.mode=strict;

Another situation we want to protect against dynamic partition insert is that the user may accidentally specify all partitions to be dynamic partitions without specifying one static partition, while the original intention is to just overwrite the sub-partitions of one root partition.我们要防止动态分区插入的另一种情况是,用户可能不小心将所有分区指定为动态分区,而没有指定一个 static 分区,而本意只是覆盖一个根分区的子分区。 We define another parameter hive.exec.dynamic.partition.mode=strict to prevent the all-dynamic partition case.我们定义另一个参数 hive.exec.dynamic.partition.mode=strict 来防止全动态分区的情况。

See https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-QueryingandInsertingData请参阅https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-QueryingandInsertingData

  1. Optional Delete the backup table after you're finished.可选完成后删除备份表。
DROP TABLE aggregate_status_bkp;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM