简体   繁体   中英

AWS S3 - Listing all objects inside a folder without the prefix

I'm having problems retrieving all objects(filenames) inside a folder in AWS S3. Here's my code:

ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
            .withBucketName(bucket)
            .withPrefix(folderName + "/")
            .withMarker(folderName + "/")

    ObjectListing objectListing = amazonWebService.s3.listObjects(listObjectsRequest)

    for (S3ObjectSummary summary : objectListing.getObjectSummaries()) {
        print summary.getKey()
    }

It returns the correct object but with the prefix in it, eg foldename/filename

I know I can just use java perhaps substring to exclude the prefix but I just wanted to know if there is a method for it in AWS SDK.

There is not. Linked is a list of all the methods that are available. The reason behind this is the S3 design. S3 does not have "subfolders". Instead it is simply a list of files, where the filename is the "prefix" plus the filename you desire. The GUI shows the data similar to windows stored in "folders", but there is not folder logic present in S3.

http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/S3ObjectSummary.html

Your best bet is to split by "/" and to take the last object in the array.

For Scala developers, here it is recursive function to execute a full scan and map of the contents of an AmazonS3 bucket using the official AWS SDK for Java

import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest}
import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}

def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {

  def scan(acc:List[T], listing:ObjectListing): List[T] = {
    val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())
    val mapped = (for (summary <- summaries) yield f(summary)).toList

    if (!listing.isTruncated) mapped.toList
    else scan(acc ::: mapped, s3.listNextBatchOfObjects(listing))
  }

  scan(List(), s3.listObjects(bucket, prefix))
}

To invoke the above curried map() function, simply pass the already constructed (and properly initialized) AmazonS3Client object (refer to the official AWS SDK for Java API Reference ), the bucket name and the prefix name in the first parameter list. Also pass the function f() you want to apply to map each object summary in the second parameter list.

For example

map(s3, bucket, prefix) { s => println(s.getKey.split("/")(1)) }

will print all the filenames (without the prefix)

val tuple = map(s3, bucket, prefix)(s => (s.getKey, s.getOwner, s.getSize))

will return the full list of (key, owner, size) tuples in that bucket/prefix

val totalSize = map(s3, "bucket", "prefix")(s => s.getSize).sum

will return the total size of its content (note the additional sum() folding function applied at the end of the expression ;-)

You can combine map() with many other functions as you would normally approach by Monads in Functional Programming

This code help me to find sub-directory of my bucket.

Example :- "Testing" is a my bucket name , inside that contain "kdblue@gmail.com " folder then its contain "IMAGE" folder in which contain image files.

     ArrayList<String> transferRecord = new ArrayList<>();    

     ListObjectsRequest listObjectsRequest =
                            new ListObjectsRequest()
                                    .withBucketName(Constants.BUCKET_NAME)
                                    .withPrefix("kdblue@gmail.com" + "/IMAGE");

      ObjectListing objects = s3.listObjects(listObjectsRequest);
        for (;;) {
                    List<S3ObjectSummary> summaries = 
                    objects.getObjectSummaries();
                        if (summaries.size() < 1) {
                            break;
                        }

                       for(int i=0;i<summaries.size();i++){
                            ArrayList<String> file = new ArrayList<>();

                            file.add(summaries.get(i).getKey());
                            transferRecord.add(file);
                        }

                        objects = s3.listNextBatchOfObjects(objects);
               }

I hope this helps you.

Just to follow up on the comment above - "here it is recursive function to execute a full scan and map" - there is a bug in the code (as @Eric highlighted) if there are more than 1000 keys in the bucket. The fix is actually quite simple, the mapped.toList needs to be merged with acc.

def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {

  def scan_s3_bucket(acc:List[T], listing:ObjectListing): List[T] = {
    val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())
    val mapped = (for (summary <- summaries) yield f(summary)).toList

    if (!listing.isTruncated) {
      acc ::: mapped.toList
    } else {
      println("list extended, more to go: new_keys '%s', current_length '%s'".format(mapped.length, acc.length))
      scan_s3_bucket(acc ::: mapped, s3.listNextBatchOfObjects(listing))
    }
  }

  scan_s3_bucket(List(), s3.listObjects(bucket, prefix))
}

Below snipped worked quite well for me. Reference: https://codeflex.co/get-list-of-objects-from-s3-directory/

    List<String> getObjectslistFromFolder(String bucketName, String folderKey, AmazonS3 s3Client) {

    ListObjectsRequest listObjectsRequest = new ListObjectsRequest().withBucketName(bucketName)
            .withPrefix(folderKey + "/");

    List<String> keys = new ArrayList<String>();

    ObjectListing objects = s3Client.listObjects(listObjectsRequest);
    for (;;) {
        List<S3ObjectSummary> summaries = objects.getObjectSummaries();
        if (summaries.size() < 1) {
            break;
        }

        // summaries.forEach(s -> keys.add(s.getKey()));
        // changed project compliance to jre 1.8
        summaries.forEach(s -> keys.add(s.getKey()));

        objects = s3Client.listNextBatchOfObjects(objects);
    }

    return keys;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM