[英]Using Java find files added to S3 bucket folder in the last 1 hour
We have 1,000,000 files in an S3 folder. S3文件夹中有1,000,000个文件。 Every hour some new files are added to the above folder.
每小时都有一些新文件添加到上述文件夹中。
Using Java how do i find the files added in the last 5 mins or last 1 hour? 使用Java如何查找最近5分钟或最近1小时添加的文件?
The most straightforward way to do this is to scan the bucket and look for new timestamps. 最简单的方法是扫描存储桶并寻找新的时间戳。 This is expensive, slow, and really a bad idea.
这是昂贵的,缓慢的,实际上是一个坏主意。
ObjectListing objList = s3.listObjects(bucketname)
for (S3ObjectSummary obj in objList.getObjectSummaries()) {
// compare obj.getLastModified(), a Date object
}
Second, you could structure your uploads to be lexicographically consistent to the date, then scan for a partial key. 其次,您可以将您的上载安排为与日期在字典上一致,然后扫描部分密钥。 In the following case I'm scanning for entries in a given hour:
在以下情况下,我将扫描给定小时内的条目:
s3.listObjects(bucketname, "2015-02-01-15")
Finally, the best option is to use the relatively new S3 Event Notifications to add each key to a SQS queue. 最后,最好的选择是使用相对较新的S3事件通知将每个密钥添加到SQS队列。 I'm not going to detail this, even in pseudocode, but Eric Hammond has a good entry detailing how to do this .
即使是伪代码,我也不会详细介绍它,但是Eric Hammond有一个很好的条目详细介绍了如何执行此操作 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.