繁体   English   中英

在 scala 中列出具有给定前缀的 S3 中的所有对象

[英]List All objects in S3 with given Prefix in scala

我正在尝试使用以下代码列出 AWS S3 存储桶中具有输入存储桶名称和过滤器前缀的所有对象。

import scala.collection.JavaConverters._
import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.ListObjectsV2Request

val bucket_name = "Mybucket"
val fiter_prefix = "Test/a/"

def list_objects(str: String): mutable.Buffer[String] = {
        val request : ListObjectsV2Request = new ListObjectsV2Request().withBucketName(bucket_name).withPrefix(str)
        var result: ListObjectsV2Result = new ListObjectsV2Result()
        do {
         result = s3_client.listObjectsV2(request)
         val token = result.getNextContinuationToken
         System.out.println("Next Continuation Token: " + token)
         request.setContinuationToken(token)
        }while(result.isTruncated)
        result.getObjectSummaries.asScala.map(_.getKey).size
}

list_objects(fiter_prefix)

我已经应用了延续方法,但我只是得到了最后一个 object 列表。 例如,前缀有 2210 个对象,我只取回 210 个对象。

问候鲯鳅鱼

listObjectsV2返回存储桶中的部分或全部(最多1,000 )对象,如此所述。 您需要使用Continuation Token来迭代桶中的对象 rest。

这里有一个 java 的示例代码。

这是对我有用的代码。

import scala.collection.JavaConverters._
import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.ListObjectsV2Request

val bucket_name = "Mybucket"
val fiter_prefix = "Test/a/"

def list_objects(str: String): List[String] = {
        val s3_client = new AmazonS3Client
        var final_list: List[String] = List()
        var list: List[String] = List()
        val request: ListObjectsV2Request = new ListObjectsV2Request().withBucketName(bucket_name).withPrefix(str)
        var result: ListObjectsV2Result = new ListObjectsV2Result()
        do {
          result = s3_client.listObjectsV2(request)
          val token = result.getNextContinuationToken
          System.out.println("Next Continuation Token: " + token)
          request.setContinuationToken(token)
          list = (result.getObjectSummaries.asScala.map(_.getKey)).toList
          println(list.size)
          final_list = final_list ::: list
          println(final_list)
        } while (result.isTruncated)
        println("size", final_list.size)
        final_list
      }

list_objects(fiter_prefix)

使用香草 Scala 避免变量和尾递归的解决方案:

  import software.amazon.awssdk.regions.Region
  import software.amazon.awssdk.services.s3.S3Client
  import software.amazon.awssdk.services.s3.model.{ListObjectsV2Request, 
  ListObjectsV2Response}

  import scala.annotation.tailrec
  import scala.collection.JavaConverters.asScalaBufferConverter
  import scala.collection.mutable
  import scala.collection.mutable.ListBuffer

  val sourceBucket    = "yourbucket"
  val sourceKey       = "yourKey"
  val subFolderPrefix = "yourprefix"


  def getAllPaths(s3Client: S3Client, initReq: ListObjectsV2Request): List[String] = {
    @tailrec
    def listAllObjectsV2(
      s3Client: S3Client,
      req: ListObjectsV2Request,
      tokenOpt: Option[String],
      isFirstTime: Boolean,
      initList: ListBuffer[String]
    ): ListBuffer[String] = {
      println(s"IsFirstTime = ${isFirstTime}, continuationToken = ${tokenOpt}")
      (isFirstTime, tokenOpt) match {
        case (true, Some(x)) =>
          // this combo is not possible..
          initList
        case (false, None) =>
          // end
          initList
        case (_, _) =>
          // possible scenarios are :
          // true, None : First iteration
          // false, Some(x): Second iteration onwards
          val response =
            s3Client.listObjectsV2(tokenOpt.fold(req)(token => req.toBuilder.continuationToken(token).build()))
          val keys: Seq[String] = response.contents().asScala.toList.map(_.key())
          val nextTokenOpt      = Option(response.nextContinuationToken())
          listAllObjectsV2(s3Client, req, nextTokenOpt, isFirstTime = false, keys ++: initList)
      }
    }
    listAllObjectsV2(s3Client, initReq, None, true, mutable.ListBuffer.empty[String]).toList
  }
  val s3Client = S3Client.builder().region(Region.US_WEST_2).build()
  val request: ListObjectsV2Request =
      ListObjectsV2Request.builder
        .bucket(sourceBucket)
        .prefix(sourceKey + "/" + subFolderPrefix)
        .build

  val listofAllKeys: List[String] = getAllPaths(s3Client, request)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM