简体   繁体   中英

Using Java find files added to S3 bucket folder in the last 1 hour

We have 1,000,000 files in an S3 folder. Every hour some new files are added to the above folder.

Using Java how do i find the files added in the last 5 mins or last 1 hour?

The most straightforward way to do this is to scan the bucket and look for new timestamps. This is expensive, slow, and really a bad idea.

ObjectListing  objList = s3.listObjects(bucketname)
for (S3ObjectSummary obj in objList.getObjectSummaries()) {
  // compare obj.getLastModified(), a Date object
}

Second, you could structure your uploads to be lexicographically consistent to the date, then scan for a partial key. In the following case I'm scanning for entries in a given hour:

s3.listObjects(bucketname, "2015-02-01-15")

Finally, the best option is to use the relatively new S3 Event Notifications to add each key to a SQS queue. I'm not going to detail this, even in pseudocode, but Eric Hammond has a good entry detailing how to do this .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM