When a hive external table or partition is dropped, it only removes the metadata from hive metastore. The underlying data in HDFS/ Azure storage account are not deleted. What are the options for deleting the data while the table/ partition is dropped?
I have been doing some research and these are my findings
Option 1: Drop the table/ partition & remove corresponding files in HDFS/ Azure Blob storage if using HDInsight.
Option 2: Update hive metastore to make the table property as managed. drop the partition and change back to table property external as below.
ALTER TABLE poc_drop_partition SET TBLPROPERTIES('EXTERNAL'='FALSE') ;
ALTER TABLE poc_drop_partition DROP IF EXISTS PARTITION(partition_date <= '2017-10-11');
ALTER TABLE poc_drop_partition SET TBLPROPERTIES('EXTERNAL'='TRUE') ;
Similarly DROP table statement will drop the table and the underlying data files.
Is there any better ways of doing this. I am aware that there is TRUNCATE functionality in JIRA to be worked on.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.