简体   繁体   中英

Update/Edit records in Hdfs using Hive

I have some records of people in HDFS. I use external table in Hive to view, to do my analytics on that particular data and also I can use it externally in other programs.

Recently I got an use case where I have to update the data in HDFS. As per documentation I got to know that we cant update or delete data using external table.

Another problem is the data is not ORC format. It is actually in TEXTFILE format. So I am unable to do update or delete data in internal table too. As it is in production I cant copy it to anywhere to convert it to ORC Format. Please suggest me how to Edit the data in HDFS.

You can Update or Delete using INSERT OVERWRITE + select from itself using filters and additional transformatins:

insert overwrite table mytable
select col1, --apply transformations here
       col2, --for example: case when col2=something then something_else else col2 end as col2
       ...
      colN
  from mytable
 where ... filter out records you want to delete 

This approach will work for both External and Managed and for all storage formats. Just write select which returns required dataset and add INSERT OVERWRITE.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM