简体   繁体   中英

How to delete customer information from hdfs

Suppose, I have several customers today so I am storing their information like customer_id, customer_name, customer_emailid etc. If my customer is leaving and he wants that his personal information should be removed from my hdfs.

So I have below two approaches to achieve the same.

Approach 1:

1.Create Internal Table on top of HDFS

2.Create external table from first table using filter logic

3.While Creating 2nd Table apply udfs on specific columns for more column filtering

Approach 2:

Spark=> Read, filter, write

Is there any other solution?

Approach 2 is possible in Hive - select, filter, write

Create a table on top of directory in hdfs (external or managed, does not matter in this context, better external if you are going to drop table later and keep the data as is). Insert overwrite table or partition from select with filter.

insert overwrite mytable 
select *                       
 from mytable --the same table
where customer_id not in (...) --filter rows

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM