简体   繁体   中英

How to download multiple files having same prefix as filename from various folders of S3 bucket?

Let's say that I have an S3 bucket named bucketSample .

And I have different folders like abc , def and xyz .

Now I have multiple files having prefix hij_ in all the above mentioned folders.

I want to download all the files having prefix hij_ . (For Example, hij_qwe.txt , hij_rty.pdf ,etc)

I have gone through various ways but for GetObject I have to provide specific object names and I only know the prefix.

And using TransferManager I can download all files of folder abc but not the files with the specific prefix only.

So is there any way that I can only download all the files with prefix hij_ ?

public void getFiles(final Set<String> bucketName, final Set<String> keys, final Set<String> prefixes) {
    try {
        ObjectListing objectListing = s3Client.listObjects(bucketName); //lists all the objects in the bucket
        while (true) {
            for (Iterator<?> iterator = objectListing.getObjectSummaries().iterator();
                 iterator.hasNext(); ) {
                S3ObjectSummary summary = (S3ObjectSummary) iterator.next();
                for (String key : keys) {
                    for (String prefix : prefixes)
                        if (summary.getKey().startsWith(key + "/" prefix)) {
                            //HERE YOU CAN GET THE FULL KEY NAME AND HENCE DOWNLOAD IT IN NEW FILE USING THE TRANFER MANAGER
                        }
                    }
                }
            }
            if (objectListing.isTruncated()) {
                objectListing = s3Client.listNextBatchOfObjects(objectListing);
            } else {
                break;
            }
        }
    } catch (AmazonServiceException e) { }
}

Read about the AWS Directory Structure in here : How does AWS S3 store files? (directory structure)

Therefore, for your use case key + "/" + prefix acts as the prefix of the objects stored in the S3 bucket. By comparing the prefix will all the objects in the S3 Bucket, you can get the full key name.

With python you can use boto3 library which I found very useful for solving a similar case.

Sample code:

import boto3
import os

KEY = ''
SECRET = ''
download_folder = os.path.join(os.path.expanduser('~'), 'Downloads')
bucket = 'bucketSample'
folders = ['abc', 'def', 'xyz']
prefixes = ['hij_']

try:
    # Needed for the pagination method in order to get objects with certain prefixes instead of iterating over all objects, you should get the aws_access_key_id and aws_secret_access_key for your bucket if available
    s3 = boto3.resource(
        's3',
        aws_access_key_id=KEY,
        aws_secret_access_key=SECRET)

    # Needed for the download method, you should get the aws_access_key_id and aws_secret_access_key for your bucket if available
    client = boto3.client(
        's3',
        aws_access_key_id=KEY,
        aws_secret_access_key=SECRET)

    # Get paginated objects
    paginator = client.get_paginator('list_objects')

    for folder in folders:
        for file_prefix in prefixes:
            prefix = folder + file_prefix
            page_iterator = paginator.paginate(Bucket=bucket, Prefix=prefix)

            if page_iterator:
                for page in page_iterator:
                    if 'Contents' in page:
                        for content in page['Contents']:
                            file_path = os.path.join(download_folder, content['Key'])
                            s3.meta.client.download_file(bucket, str(content['Key']), file_path)
except:
    print('An error occurred')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM