[英]Change column name of an external partitioned parquet table in hive without null/lost data
I have the following table:我有下表:
CREATE EXTERNAL TABLE aggregate_status(
m_point VARCHAR(50),
territory VARCHAR(50),
reading_meter VARCHAR(50),
meter_type VARCHAR(500)
)
PARTITIONED BY(
insert_date VARCHAR(10))
STORED AS PARQUET
LOCATION '<the s3 route>/aggregate_status'
TBLPROPERTIES(
'parquet.compression'='SNAPPY'
)
I wish to change the reading_meter
column to reading_mode
, without losing data.我希望将
reading_meter
列更改为reading_mode
,而不会丢失数据。
ALTER TABLE
works, but the field now shows null
. ALTER TABLE
有效,但该字段现在显示null
。
I'm not the owner of the Hadoop enviroment I'm working on so changing properties such as set parquet.column.index.access = true
is discarded.我不是我正在处理的 Hadoop 环境的所有者,因此会丢弃诸如
set parquet.column.index.access = true
之类的更改属性。
Any help would be appreciated.任何帮助,将不胜感激。 Thanks.
谢谢。
Managed to find a solution, at least for short amounts of data.设法找到解决方案,至少对于少量数据。
CREATE TABLE aggregate_status_bkp AS
SELECT
m_point,
territory,
reading_meter AS reading_mode,
meter_type,
insert_date
FROM aggregate_status
ALTER TABLE aggregate_status CHANGE COLUMN reading_meter reading_mode VARCHAR (50)
--You might need to temporarily disable strict partition mode depending on your case, this is safe since it's only a lock.
--set hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE aggregate_status PARTITION(insert_date)
SELECT
m_point,
territory,
reading_mode,
meter_type,
insert_date
FROM aggregate_status_bkp;
--set hive.exec.dynamic.partition.mode=strict;
Another situation we want to protect against dynamic partition insert is that the user may accidentally specify all partitions to be dynamic partitions without specifying one static partition, while the original intention is to just overwrite the sub-partitions of one root partition.
我们要防止动态分区插入的另一种情况是,用户可能不小心将所有分区指定为动态分区,而没有指定一个 static 分区,而本意只是覆盖一个根分区的子分区。 We define another parameter hive.exec.dynamic.partition.mode=strict to prevent the all-dynamic partition case.
我们定义另一个参数 hive.exec.dynamic.partition.mode=strict 来防止全动态分区的情况。
See https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-QueryingandInsertingData请参阅https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-QueryingandInsertingData
DROP TABLE aggregate_status_bkp;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.