简体   繁体   中英

Amazon S3: How to get a list of folders in the bucket?

The only thing I've found, it's this method GET Bucket .
But I can't understand how can I get only a list of folders in the current folder. What prefix and delimiter do I need to use? Is that possible at all?

For the sake of example, assume I have a bucket in the USEast1 region called MyBucketName , with the following keys:

 temp/
 temp/foobar.txt
 temp/txt/
 temp/txt/test1.txt
 temp/txt/test2.txt
 temp2/

Working with folders can be confusing because S3 does not natively support a hierarchy structure -- rather, these are simply keys like any other S3 object. Folders are simply an abstraction available in the S3 web console to make it easier to navigate a bucket. So when we're working programatically, we want to find keys matching the dimensions of a 'folder' (delimiter '/', size = 0) because they will likely be 'folders' as presented to us by the S3 console.

Note for both examples: I'm using the AWSSDK.S3 version 3.1 NuGet package.

Example 1: All folders in a bucket

This code is modified from this basic example in the S3 documentation to list all keys in a bucket. The example below will identify all keys that end with the delimiter character / , and are also empty.

IAmazonS3 client;
using (client = new AmazonS3Client(Amazon.RegionEndpoint.USEast1))
{
    // Build your request to list objects in the bucket
    ListObjectsRequest request = new ListObjectsRequest
    {
        BucketName = "MyBucketName"
    };

    do
    {
        // Build your call out to S3 and store the response
        ListObjectsResponse response = client.ListObjects(request);

        // Filter through the response to find keys that:
        // - end with the delimiter character '/' 
        // - are empty. 
        IEnumerable<S3Object> folders = response.S3Objects.Where(x =>
            x.Key.EndsWith(@"/") && x.Size == 0);

        // Do something with your output keys.  For this example, we write to the console.
        folders.ToList().ForEach(x => System.Console.WriteLine(x.Key));

        // If the response is truncated, we'll make another request 
        // and pull the next batch of keys
        if (response.IsTruncated)
        {
            request.Marker = response.NextMarker;
        }
        else
        {
            request = null;
        }
    } while (request != null);
}

Expected output to console:

temp/
temp/txt/
temp2/

Example 2: Folders matching a specified prefix

You could further limit this to only retrieve folders matching a specified Prefix by setting the Prefix property on ListObjectsRequest .

ListObjectsRequest request = new ListObjectsRequest
    {
        BucketName = "MyBucketName",
        Prefix = "temp/"
    };

When applied to Example 1, we would expect the following output:

temp/
temp/txt/

Further reading:

Using prefix of the/path/to/read/ (note that there is no leading slash, but there is a trailing slash), and delimiter of / , you'll find all the folders within that folder inside <CommonPrefixes> .

CommonPrefixes

A response can contain CommonPrefixes only if you specify a delimiter. When you do, CommonPrefixes contains all (if there are any) keys between Prefix and the next occurrence of the string specified by delimiter. In effect, CommonPrefixes lists keys that act like subdirectories in the directory specified by Prefix . For example, if prefix is notes/ and delimiter is a slash (/), in notes/summer/july, the common prefix is notes/summer/. All of the keys rolled up in a common prefix count as a single return when calculating the number of returns. See MaxKeys.

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html

Alternatively another simpler approach is using https://github.com/minio/minio-dotnet

Minio .Net implements minimal API's to work with Amazon S3 and other compatible storage solutions.

Following example shows how you can filter out only directories. Here the CommonPrefix is abstracted as a folder through the ListObjects() API.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using Minio;
using Minio.Xml;

namespace Minio.Examples
{
    class ListObjects
    {
        static int Main(string[] args)
        {
            var client = new MinioClient("https://s3.amazonaws.com", "ACCESSKEY", "SECRETKEY");

            var items = client.ListObjects("bucket");

            foreach (Item item in items)
            {
                if (item.IsDir)
                {
                    Console.Out.WriteLine("{0}", item.Key);
                }
            }
            return 0;
        }
    }
}

What Anthony is missing here is that a folder doesn't necessarily have a key associated with it. If a file is created in S3, and given a key like "folder/name.ext", S3 will display a "folder" folder, but it doesn't have a key, meaning you're not getting it in your results.

The only way to catch these folders is to look at the keys themselves, and regex the key name for the "/" character. If I knew C# a little better, I'd write you a code sample, but for reference here's a python example I wrote on another question.

Adding a delimiter of '/' to my parameters did the trick for me.

In case anyone needs a NodeJS solution this is what I used:

listdelim: function (path) {
    const params = {
        Bucket: process.env['bucket'],
        MaxKeys: 1000,
        Prefix: path,
        Delimiter: '/'
    }
    return new Promise((resolve, reject) => {
        s3.listObjectsV2(params, function (err, data) {
            if (err) {
                console.log(err, err.stack)
                reject(err)
            } else {
                resolve(data)
            }
        })
    })
}

In case you don't mind using a third-party open-source tool, you can use rclone 's command lsf which includes --dirs-only and --recursive .

  1. Install rclone
  2. Set environment variables
    export AWS_REGION=eu-west-1 export S3_BUCKET_NAME=my.private.bucket export S3_BUCKET_PREFIX=temp
  3. Gets all directories under a given prefix; If the prefix is empty, the whole bucket will be scanned, which is not recommended for buckets with many objects. The exclude pattern */*/*/ means it will include only up to three subdirectories, not including the prefix, for example, include up to temp/*/*/*/ .
     export \ RCLONE_S3_REGION=${AWS_REGION} RCLONE_S3_PROVIDER=AWS \ RCLONE_CONFIG_MYS3_TYPE=s3 && \ rclone \ --exclude "*/*/*/" \ lsf mys3,env_auth:"${S3_BUCKET_NAME}/${S3_BUCKET_PREFIX}" \ --recursive --dirs-only --config /dev/null

NOTE-1 : Using the env_auth option enables using the default AWS credentials providers , as if you were using AWS CLI . Bottom line, env_auth allows you to use AWS_PROFILE , or AWS_ACCESS_KEY_ID + ACCESS_SECRET_ACCESS_KEY to authenticate.

NOTE-2 : Adding the --config /dev/null option suppresses the warning message NOTICE: Config file "/Users/username/.config/rclone/rclone.conf" not found - using defaults

Bonus

In case the bucket is meant for build artifacts with branch based paths, here's a great example of how to list S3 directories with advanced exclusion patterns.

  1. Set environment variables
    export AWS_REGION=eu-west-1 export S3_BUCKET_NAME=my.private.bucket export S3_BUCKET_PREFIX=artifacts
  2. Assuming the S3 artifacts dir structure is:
    • artifacts/${version}/development/
    • artifacts/${version}/RC/
    • artifacts/${version}/feature/${branch_name}
  3. Examples of paths that will be included :
    • artifacts/v1.1.1/development/
    • artifacts/v1.1.0/RC/
    • artifacts/v1.1.1/feature/jira-123-ticket/
  4. Examples of paths that will be excluded :
    • artifacts/v1.1.1/development/docs/
    • artifacts/v1.1.0/RC/docs/
    • artifacts/v1.1.1/feature/jira-123-ticket/docs/
  5. Get list of directories with rclone
    export \ RCLONE_S3_REGION=${AWS_REGION} RCLONE_S3_PROVIDER=AWS \ RCLONE_CONFIG_MYS3_TYPE=s3 && \ rclone \ --exclude "*/development/*/" \ --exclude "*/RC/*/" \ --exclude "*/*/*/*/" \ lsf mys3,env_auth:"${S3_BUCKET_NAME}/${S3_BUCKET_PREFIX}" \ --recursive --dirs-only --config /dev/null

Troubleshooting

I had to unset the environment variable AWS_CA_BUNDLE before using rclone to make it work.

unset AWS_CA_BUNDLE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM