简体   繁体   中英

incrementally copy files from S3 to local hdfs

i have an app write data to S3 daily or hourly or just randomly, another app to read data from S3 to local HBase. is there any way to tell what's the last file uploaded from last update, then read files after that, in other word, incrementally copy the files?

for example: day 1: App1 write files 1,2,3 to folder 1;App2 read those 3 files to HBase; day 4: App1 write file 4 & 5 to folder 1, 6,7,8 to folder 2; App2 need to read 4 &5 from folder 1 and then 6,7,8 from folder 2.

thanks

The LastModified header field can be used to process data based on the creation date. This requires a built in logic on the client side which stores the items which are already processed and the new items. You can simply store the date which you processed so everything comes after that is considered as new.

Example:

s3cmd ls s3://test
2012-07-24 18:29  36303234   s3://test/dl.pdf

See the date in the front of the file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM