简体   繁体   中英

Download only specific folder in S3 bucket using python boto

The link below shows how to download an entire S3 content. However, how does one get subfolder content. Suppose my S3 folder has the following emulated structure.

S3Folder/S1/file1.c

S3Folder/S1/file2.h

S3Folder/S1/file1.h

S3Folder/S2/file.exe

S3Folder/S2/resource.data

Suppose I am interested only in S2 folder. How do I isolate the keys in bucket list ?

local backup of an S3 content

conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(bucket_name)

# go through the list of files
bucket_list = bucket.list()
for l in bucket_list:
  keyString = str(l.key)
  d = LOCAL_PATH + keyString
  try:
    l.get_contents_to_filename(d)
  except OSError:
    # check if dir exists
    if not os.path.exists(d):
      os.mkdir(d)

You can download s3 objects by adding prefix of it in the key value.

So, according to your Question , you just need to add prefix '/S2' while downloading objects

FYI: s3 download object using boto3

For more check this

You could do the following:

import os
import boto3
s3_resource = boto3.resource("s3", region_name="us-east-1")

    def download_objects():
        root_dir = 'D:/' # local machine location
        s3_bucket_name = 'S3_Bucket_Name' #s3 bucket name
        s3_root_folder_prefix = 'sample' # bucket inside root folder
        s3_folder_list = ['s3_folder_1','s3_folder_2','s3_folder_3'] # root folder inside sub folders list

        my_bucket = self.s3_resource.Bucket(s3_bucket_name)
        for file in my_bucket.objects.filter(Prefix=s3_root_folder_prefix):
            if any(s in file.key for s in s3_folder_list):
                try:
                    path, filename = os.path.split(file.key)
                    try:
                        os.makedirs(root_dir + path)
                    except Exception as err:
                        pass
                    my_bucket.download_file(file.key, root_dir + path + '/' + filename)
                except Exception as err:
                    print(err)

if __name__ == '__main__':
    download_objects()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM