简体   繁体   English

如何使用 scala 和 aws-java-sdk 从 S3 存储桶中获取所有 S3ObjectSummary?

[英]How to get all S3ObjectSummary from an S3 bucket using scala and the aws-java-sdk?

I have a scala project and I try to implement a service requiring access to an Amazon S3 bucket.我有一个 scala 项目,我尝试实施需要访问 Amazon S3 存储桶的服务。

I want to get a list of all the objects of a bucket, yet the result set of s3Client.listObjects is paginated to a 1000 items.我想获取存储桶中所有对象的列表,但s3Client.listObjects的结果集被分页为 1000 个项目。

One has to fetch multiple objectListings in order to get all results.必须获取多个 objectListings 才能获得所有结果。

I have found an example Java implementation yet it relies on mutability (overwriting the objectListing in the while loop):我找到了一个示例 Java 实现,但它依赖于可变性(覆盖 while 循环中的 objectListing):

AmazonS3 s3Client = AmazonS3Provider.getS3Client();
ListObjectsRequest req = new ListObjectsRequest().withBucketName(realBucket).withPrefix(!preprefix.equals("") ? preprefix + "/" + prefix : prefix);
ObjectListing objectListing = s3Client.listObjects(req);
List<S3ObjectSummary> summaries = objectListing.getObjectSummaries();

while (objectListing.isTruncated()) {
    objectListing = s3Client.listNextBatchOfObjects(objectListing);
    summaries.addAll(objectListing.getObjectSummaries());
}

While I can translate that into scala fine, I want to use a more idiomatic scala way.虽然我可以很好地将其翻译成 scala,但我想使用更惯用的 scala 方式。

How can I get all pages of a bucket using scala?如何使用 scala 获取存储桶的所有页面?

I am now using a recursive approach and filling up a result object during each iteration. 我现在使用递归方法并在每次迭代期间填充结果对象。 And once the last page is reached, it will return the final collection. 一旦到达最后一页,它将返回最终的集合。

The relevant part is happening in the getAllSummaries method, I keep the other implementation details so that it may help others to get it working more easily. 相关部分发生在getAllSummaries方法中,我保留其他实现细节,以便它可以帮助其他人更容易地使其工作。 (My AmazonS3Config is a basic case class containing my S3 credentials.) (我的AmazonS3Config是一个包含我的S3凭据的基本案例类。)

import com.amazonaws.auth.{AWSStaticCredentialsProvider, BasicAWSCredentials}
import com.amazonaws.regions.Regions
import com.amazonaws.services.s3.model.{ObjectListing, S3ObjectSummary}
import com.amazonaws.services.s3.{AmazonS3, AmazonS3ClientBuilder}

import scala.collection.JavaConverters._

object Starter extends App with Configurable {

  private lazy val client: AmazonS3 = createAmazonClient(this.config.s3)

  val objects = getAllObjects()

  def getAllObjects(): Seq[S3ObjectSummary] = {
    val bucket = "YOUR_BUCKET_NAME"
    val prefix = ""

    val objectListing: ObjectListing = client.listObjects(bucket, prefix)

    getAllSummaries(objectListing)
  }

  private def getAllSummaries(list: ObjectListing,
                              res: Seq[S3ObjectSummary] = Seq.empty[S3ObjectSummary]): Seq[S3ObjectSummary] =
    list.isTruncated match {
      case false => {
        res ++ list.getObjectSummaries.asScala
      }
      case true =>
        val newList = this.client.listNextBatchOfObjects(list)
        getAllSummaries(newList, res ++ newList.getObjectSummaries.asScala)

    }

  private def createAmazonClient(config: AmazonS3Config): AmazonS3 = {
    val region = Regions.valueOf(config.region)
    val awsCredentials = new BasicAWSCredentials(config.accessKey, config.secretKey)

    AmazonS3ClientBuilder
      .standard()
      .withCredentials(new AWSStaticCredentialsProvider(awsCredentials))
      .withRegion(region)
      .build()
  }
}

There are is mistake below.下面有错误。 First batch of data have been missed: You have to change code: getAllSummaries(newList, res ++ newList.getObjectSummaries.asScala) to getAllSummaries(newList, res ++ list.getObjectSummaries.asScala) Correct code for getAllSummaries () function is:第一批数据丢失:你必须更改代码: getAllSummaries(newList, res ++ newList.getObjectSummaries.asScala) to getAllSummaries(newList, res ++ list.getObjectSummaries.asScala) getAllSummaries () function 的正确代码是:

 private def getAllSummaries(list: ObjectListing,
                              res: Seq[S3ObjectSummary] = Seq.empty[S3ObjectSummary]): Seq[S3ObjectSummary] =
    list.isTruncated match {
      case false => {
        res ++ list.getObjectSummaries.asScala
      }
      case true =>
        val newList = this.client.listNextBatchOfObjects(list)
        getAllSummaries(newList, res ++ list.getObjectSummaries.asScala)

    }

But better style for Scala is:但 Scala 更好的风格是:

@tailrec
def getAllSummaries(list: ObjectListing,
                    acc: Seq[S3ObjectSummary]): Seq[S3ObjectSummary] =
  if (list.isTruncated) {
    val newList = client.listNextBatchOfObjects(list)
    getAllSummaries(newList, acc ++ list.getObjectSummaries.asScala)
  } else {
    acc ++ list.getObjectSummaries.asScala
  }

And using as:并用作:

val objectListing: ObjectListing = client.listObjects(bucket, prefix)
getAllSummaries(objectListing, Seq.empty[S3ObjectSummary])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用aws-java-sdk从S3逐块读取文件 - How to read file chunk by chunk from S3 using aws-java-sdk 如何设置ObjectListing class的S3ObjectSummary? - How to set S3ObjectSummary of ObjectListing class? 如何使用 aws-sdk-2.x 从 S3 存储桶中获取 object 的 S3 URL - How to get S3 URL for the object from S3 bucket using aws-sdk-2.x 使用 scala 从 s3 存储桶下载所有文件 - Download all the files from a s3 bucket using scala Amazon s3 只为一个桶返回 1000 个条目,而为另一个桶返回所有条目(使用 java sdk)? - Amazon s3 returns only 1000 entries for one bucket and all for another bucket (using java sdk)? 如何使用 AWS SDK 为 Python 递归列出 AWS S3 存储桶中的文件? - How to recursively list files in AWS S3 bucket using AWS SDK for Python? 我可以直接从 S3 存储桶加载数据以检测 Java 的 AWS SDK 中的关键短语吗? - Can I load data directly from a S3 Bucket for detecting key phrases in the AWS SDK for Java? 用于 nodejs 的 AWS SDK v3,如何获取 s3 存储桶的标签? - AWS SDK v3 for nodejs, how to get tags of an s3 bucket? 如何使用 sdk 在 AWS 中创建没有公共访问权限的 s3 存储桶和对象 go - how to create an s3 bucket and objects with no public access in AWS using sdk go 将文件从一个 AWS 帐户的 S3 存储桶复制到另一个 AWS 帐户的 S3 存储桶 + 使用 NodeJS - Copy files from one AWS account's S3 bucket to another AWS account's S3 bucket + using NodeJS
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM