[英]How to get all S3ObjectSummary from an S3 bucket using scala and the aws-java-sdk?
I have a scala project and I try to implement a service requiring access to an Amazon S3 bucket.我有一个 scala 项目,我尝试实施需要访问 Amazon S3 存储桶的服务。
I want to get a list of all the objects of a bucket, yet the result set of s3Client.listObjects
is paginated to a 1000 items.我想获取存储桶中所有对象的列表,但
s3Client.listObjects
的结果集被分页为 1000 个项目。
One has to fetch multiple objectListings in order to get all results.必须获取多个 objectListings 才能获得所有结果。
I have found an example Java implementation yet it relies on mutability (overwriting the objectListing in the while loop):我找到了一个示例 Java 实现,但它依赖于可变性(覆盖 while 循环中的 objectListing):
AmazonS3 s3Client = AmazonS3Provider.getS3Client();
ListObjectsRequest req = new ListObjectsRequest().withBucketName(realBucket).withPrefix(!preprefix.equals("") ? preprefix + "/" + prefix : prefix);
ObjectListing objectListing = s3Client.listObjects(req);
List<S3ObjectSummary> summaries = objectListing.getObjectSummaries();
while (objectListing.isTruncated()) {
objectListing = s3Client.listNextBatchOfObjects(objectListing);
summaries.addAll(objectListing.getObjectSummaries());
}
While I can translate that into scala fine, I want to use a more idiomatic scala way.虽然我可以很好地将其翻译成 scala,但我想使用更惯用的 scala 方式。
How can I get all pages of a bucket using scala?如何使用 scala 获取存储桶的所有页面?
I am now using a recursive approach and filling up a result object during each iteration. 我现在使用递归方法并在每次迭代期间填充结果对象。 And once the last page is reached, it will return the final collection.
一旦到达最后一页,它将返回最终的集合。
The relevant part is happening in the getAllSummaries
method, I keep the other implementation details so that it may help others to get it working more easily. 相关部分发生在
getAllSummaries
方法中,我保留其他实现细节,以便它可以帮助其他人更容易地使其工作。 (My AmazonS3Config is a basic case class containing my S3 credentials.) (我的AmazonS3Config是一个包含我的S3凭据的基本案例类。)
import com.amazonaws.auth.{AWSStaticCredentialsProvider, BasicAWSCredentials}
import com.amazonaws.regions.Regions
import com.amazonaws.services.s3.model.{ObjectListing, S3ObjectSummary}
import com.amazonaws.services.s3.{AmazonS3, AmazonS3ClientBuilder}
import scala.collection.JavaConverters._
object Starter extends App with Configurable {
private lazy val client: AmazonS3 = createAmazonClient(this.config.s3)
val objects = getAllObjects()
def getAllObjects(): Seq[S3ObjectSummary] = {
val bucket = "YOUR_BUCKET_NAME"
val prefix = ""
val objectListing: ObjectListing = client.listObjects(bucket, prefix)
getAllSummaries(objectListing)
}
private def getAllSummaries(list: ObjectListing,
res: Seq[S3ObjectSummary] = Seq.empty[S3ObjectSummary]): Seq[S3ObjectSummary] =
list.isTruncated match {
case false => {
res ++ list.getObjectSummaries.asScala
}
case true =>
val newList = this.client.listNextBatchOfObjects(list)
getAllSummaries(newList, res ++ newList.getObjectSummaries.asScala)
}
private def createAmazonClient(config: AmazonS3Config): AmazonS3 = {
val region = Regions.valueOf(config.region)
val awsCredentials = new BasicAWSCredentials(config.accessKey, config.secretKey)
AmazonS3ClientBuilder
.standard()
.withCredentials(new AWSStaticCredentialsProvider(awsCredentials))
.withRegion(region)
.build()
}
}
There are is mistake below.下面有错误。 First batch of data have been missed: You have to change code:
getAllSummaries(newList, res ++ newList.getObjectSummaries.asScala)
to getAllSummaries(newList, res ++ list.getObjectSummaries.asScala)
Correct code for getAllSummaries
() function is:第一批数据丢失:你必须更改代码:
getAllSummaries(newList, res ++ newList.getObjectSummaries.asScala)
to getAllSummaries(newList, res ++ list.getObjectSummaries.asScala)
getAllSummaries
() function 的正确代码是:
private def getAllSummaries(list: ObjectListing,
res: Seq[S3ObjectSummary] = Seq.empty[S3ObjectSummary]): Seq[S3ObjectSummary] =
list.isTruncated match {
case false => {
res ++ list.getObjectSummaries.asScala
}
case true =>
val newList = this.client.listNextBatchOfObjects(list)
getAllSummaries(newList, res ++ list.getObjectSummaries.asScala)
}
But better style for Scala is:但 Scala 更好的风格是:
@tailrec
def getAllSummaries(list: ObjectListing,
acc: Seq[S3ObjectSummary]): Seq[S3ObjectSummary] =
if (list.isTruncated) {
val newList = client.listNextBatchOfObjects(list)
getAllSummaries(newList, acc ++ list.getObjectSummaries.asScala)
} else {
acc ++ list.getObjectSummaries.asScala
}
And using as:并用作:
val objectListing: ObjectListing = client.listObjects(bucket, prefix)
getAllSummaries(objectListing, Seq.empty[S3ObjectSummary])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.