We have 1,000,000 files in an S3 folder. Every hour some new files are added to the above folder.
Using Java how do i find the files added in the last 5 mins or last 1 hour?
The most straightforward way to do this is to scan the bucket and look for new timestamps. This is expensive, slow, and really a bad idea.
ObjectListing objList = s3.listObjects(bucketname)
for (S3ObjectSummary obj in objList.getObjectSummaries()) {
// compare obj.getLastModified(), a Date object
}
Second, you could structure your uploads to be lexicographically consistent to the date, then scan for a partial key. In the following case I'm scanning for entries in a given hour:
s3.listObjects(bucketname, "2015-02-01-15")
Finally, the best option is to use the relatively new S3 Event Notifications to add each key to a SQS queue. I'm not going to detail this, even in pseudocode, but Eric Hammond has a good entry detailing how to do this .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.