Change column name of an external partitioned parquet table in hive without null/lost data

Question

I have the following table:

CREATE EXTERNAL TABLE aggregate_status(
m_point VARCHAR(50),
territory VARCHAR(50),
reading_meter VARCHAR(50),
meter_type VARCHAR(500)
)
PARTITIONED BY(
insert_date VARCHAR(10))
STORED AS PARQUET
LOCATION '<the s3 route>/aggregate_status'
TBLPROPERTIES(
  'parquet.compression'='SNAPPY'
)

I wish to change the reading_meter column to reading_mode , without losing data.

ALTER TABLE works, but the field now shows null .

I'm not the owner of the Hadoop enviroment I'm working on so changing properties such as set parquet.column.index.access = true is discarded.

Any help would be appreciated. Thanks.

Answer 1

Managed to find a solution, at least for short amounts of data.

Create a backup of the table, with the column name already changed.

CREATE TABLE aggregate_status_bkp AS
SELECT 
m_point, 
territory,
reading_meter AS reading_mode,
meter_type,
insert_date
FROM aggregate_status

Perform the ALTER TABLE

ALTER TABLE aggregate_status CHANGE COLUMN reading_meter reading_mode VARCHAR (50)

INSERT OVERWRITE from the backup to the original.

--You might need to temporarily disable strict partition mode depending on your case, this is safe since it's only a lock.

--set hive.exec.dynamic.partition.mode=nonstrict;

INSERT OVERWRITE TABLE aggregate_status PARTITION(insert_date)
SELECT
m_point,
territory,
reading_mode,
meter_type,
insert_date
FROM aggregate_status_bkp;

--set hive.exec.dynamic.partition.mode=strict;

Another situation we want to protect against dynamic partition insert is that the user may accidentally specify all partitions to be dynamic partitions without specifying one static partition, while the original intention is to just overwrite the sub-partitions of one root partition. We define another parameter hive.exec.dynamic.partition.mode=strict to prevent the all-dynamic partition case.

See https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-QueryingandInsertingData

Optional Delete the backup table after you're finished.

DROP TABLE aggregate_status_bkp;

Change column name of an external partitioned parquet table in hive without null/lost data

Question

1 answers

solution1
0 2022-09-08 16:23:14

Change column name of an external partitioned parquet table in hive without null/lost data

Question

1 answers

solution1 0 2022-09-08 16:23:14

solution1
0 2022-09-08 16:23:14