简体   繁体   中英

Download from Amazon S3, AWS CLI or Boto3?

I have a list of folder names in a txt file like:

folder_B
folder_C

There is a path in S3 bucket where I have folders like:

folder_A
folder_B
folder_C
folder_D

Each of this folder has subfolders like:

0
1
2
3

For every folder in the text file I have to find folder in S3 and download content of its subfolder with the highest number only.

Doing this by python boto3 seems to be complicated.

Is it a simple way to do this by AWS command line?

OK, I did. It is really bad but it works. I used both boto3 and aws-cli

import subprocess
import boto3

folders = []
with open('folders_list.txt', 'r', newline='') as f:
    for line in f:
        line = line.rstrip()
        folders.append(line)

def download(bucket_name):
    s3_client = boto3.client("s3")
    result = s3_client.list_objects(Bucket=bucket_name, Prefix="my_path/{}/".format(folder), Delimiter="/")
    subfolders = []
    for i in result['CommonPrefixes']:
        subfolders.append(int(i['Prefix'].split('{}/'.format(folder),1)[1][:-1]))
    subprocess.run(['aws', 's3', 'cp', 's3://my_bucket/my_path/{0}/{1}'.format(folder, max(subfolders)),
                    'C:\\Users\it_is_me\my_local_folder\{}.'.format(folder), '--recursive'])

for folder in folders:
    download('my_bucket')

Here's a simple bash one liner (assuming the format of aws s3 ls has file name as the last column):

for bucket in $(cat folder.txt); do \
  aws s3 ls s3://bucket-prefix/$bucket | awk '{print $NF}' \
  | sort -r | head -n1 \
  | xargs -I {} aws s3 cp s3://bucket-prefix/$bucket/{} $bucket/{} --recursive \
  ; done

aws-cli takes care of creating the directories if they are missing. (Tested on Ubuntu)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM