简体   繁体   中英

How to get all S3ObjectSummary from an S3 bucket using scala and the aws-java-sdk?

I have a scala project and I try to implement a service requiring access to an Amazon S3 bucket.

I want to get a list of all the objects of a bucket, yet the result set of s3Client.listObjects is paginated to a 1000 items.

One has to fetch multiple objectListings in order to get all results.

I have found an example Java implementation yet it relies on mutability (overwriting the objectListing in the while loop):

AmazonS3 s3Client = AmazonS3Provider.getS3Client();
ListObjectsRequest req = new ListObjectsRequest().withBucketName(realBucket).withPrefix(!preprefix.equals("") ? preprefix + "/" + prefix : prefix);
ObjectListing objectListing = s3Client.listObjects(req);
List<S3ObjectSummary> summaries = objectListing.getObjectSummaries();

while (objectListing.isTruncated()) {
    objectListing = s3Client.listNextBatchOfObjects(objectListing);
    summaries.addAll(objectListing.getObjectSummaries());
}

While I can translate that into scala fine, I want to use a more idiomatic scala way.

How can I get all pages of a bucket using scala?

I am now using a recursive approach and filling up a result object during each iteration. And once the last page is reached, it will return the final collection.

The relevant part is happening in the getAllSummaries method, I keep the other implementation details so that it may help others to get it working more easily. (My AmazonS3Config is a basic case class containing my S3 credentials.)

import com.amazonaws.auth.{AWSStaticCredentialsProvider, BasicAWSCredentials}
import com.amazonaws.regions.Regions
import com.amazonaws.services.s3.model.{ObjectListing, S3ObjectSummary}
import com.amazonaws.services.s3.{AmazonS3, AmazonS3ClientBuilder}

import scala.collection.JavaConverters._

object Starter extends App with Configurable {

  private lazy val client: AmazonS3 = createAmazonClient(this.config.s3)

  val objects = getAllObjects()

  def getAllObjects(): Seq[S3ObjectSummary] = {
    val bucket = "YOUR_BUCKET_NAME"
    val prefix = ""

    val objectListing: ObjectListing = client.listObjects(bucket, prefix)

    getAllSummaries(objectListing)
  }

  private def getAllSummaries(list: ObjectListing,
                              res: Seq[S3ObjectSummary] = Seq.empty[S3ObjectSummary]): Seq[S3ObjectSummary] =
    list.isTruncated match {
      case false => {
        res ++ list.getObjectSummaries.asScala
      }
      case true =>
        val newList = this.client.listNextBatchOfObjects(list)
        getAllSummaries(newList, res ++ newList.getObjectSummaries.asScala)

    }

  private def createAmazonClient(config: AmazonS3Config): AmazonS3 = {
    val region = Regions.valueOf(config.region)
    val awsCredentials = new BasicAWSCredentials(config.accessKey, config.secretKey)

    AmazonS3ClientBuilder
      .standard()
      .withCredentials(new AWSStaticCredentialsProvider(awsCredentials))
      .withRegion(region)
      .build()
  }
}

There are is mistake below. First batch of data have been missed: You have to change code: getAllSummaries(newList, res ++ newList.getObjectSummaries.asScala) to getAllSummaries(newList, res ++ list.getObjectSummaries.asScala) Correct code for getAllSummaries () function is:

 private def getAllSummaries(list: ObjectListing,
                              res: Seq[S3ObjectSummary] = Seq.empty[S3ObjectSummary]): Seq[S3ObjectSummary] =
    list.isTruncated match {
      case false => {
        res ++ list.getObjectSummaries.asScala
      }
      case true =>
        val newList = this.client.listNextBatchOfObjects(list)
        getAllSummaries(newList, res ++ list.getObjectSummaries.asScala)

    }

But better style for Scala is:

@tailrec
def getAllSummaries(list: ObjectListing,
                    acc: Seq[S3ObjectSummary]): Seq[S3ObjectSummary] =
  if (list.isTruncated) {
    val newList = client.listNextBatchOfObjects(list)
    getAllSummaries(newList, acc ++ list.getObjectSummaries.asScala)
  } else {
    acc ++ list.getObjectSummaries.asScala
  }

And using as:

val objectListing: ObjectListing = client.listObjects(bucket, prefix)
getAllSummaries(objectListing, Seq.empty[S3ObjectSummary])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM