简体   繁体   中英

Copy folders from S3 bucket with specific prefix

I have an S3 bucket

aws s3 ls s3://myBucket/
                           PRE 2020032600/
                           PRE 2020032700/
                           PRE 2020032800/
                           PRE results_2020011200/
                           PRE results_2020011300/
                           PRE results_2020011400/
                           PRE results_2020011500/

I want to copy locally only the folders that start with results_

aws s3 cp s3://myBucket/*something /Users/myName/myFolder/ --recursive

You can use the include and exclude arguments,

aws s3 cp s3://myBucket/ /Users/myName/myFolder/ --recursive --exclude "*" --include "results_*"

All files will be excluded from the bucket except for files starting with results_ . The order of the exclude and include arguments is important.

Will aws s3 sync s3://myBucket/this_folder/object_file C:\\Users\Desktop create also the "this_folder" in C:\Users\Desktop? If not, what would be the solution to copy/sync including the folder structure of S3? I mean I have many files in different S3 bucket folders sorted by year, month, day. I would like to copy them locally with the folder structure to be kept.

I had a very similar requirement. I needed to copy a given subsets of objects located in different 'folders' (or prefixes) from one s3 bucket containing 1K+ objects, to s3 buckets in different AWS accounts.

Combining the approach mentioned here by @franklinsijo and the approach mentioned in this article , I ended up writing 2 simple scripts. One copying objects from Source S3 Bucket -> Local, the other copying Local -> Destination S3 Bucket, honouring the 'folder' structure.

Script 1; copy a subset of objects from Source S3 bucket to Local:

#!/bin/bash

S3_BUCKET=<originating-bucket>
LOCAL_FOLDER=<local-work-folder>
AWS_PROFILE=<account-1>

echo "Getting selective files from S3 bucket: ${S3_BUCKET} ..."
aws s3 --profile ${AWS_PROFILE} cp s3://${S3_BUCKET}/ ${LOCAL_FOLDER} --exclude="*" --recursive \
--include "file 1.png" \
--include "file34.jpg" \
--include "fileAB.pdf" \
--include "filezzz.jpeg" \
--include "file_123" \
--include "prefix-l1/prefix l2/file-ABC.pdf" \
--include "prefix-p45/blankPDF.pdf"
echo "Done"

Script 2; copy Local to the Destination S3 bucket:

#!/bin/bash

LOCAL_FOLDER=<local-work-folder>
S3_BUCKET=<destination-bucket>
AWS_PROFILE=<account-2>

echo "Copying content of: ${LOCAL_FOLDER} to S3 bucket: ${S3_BUCKET} ..."
for entry in "$LOCAL_FOLDER"/*; do
    name=`echo $entry | sed 's/.*\///'`  # getting the name of the file or directory
    if [[ -d  $entry ]]; then  
        # echo "Coping folder recursively $LOCAL_FOLDER/$name to $S3_BUCKET/$name/"
        aws s3 --profile ${AWS_PROFILE} cp --recursive "$LOCAL_FOLDER/$name" "$S3_BUCKET/$name/"
    else  
        # echo "Coping file $LOCAL_FOLDER/$name to bucket root to $S3_BUCKET/"
        aws s3 --profile ${AWS_PROFILE} cp "$LOCAL_FOLDER/$name" $S3_BUCKET/
    fi
done
echo "Done"

The beauty of this approach. You can just use the aws-cli, with account profiles (and other s3 cli options) and no need for any 3rd party tools. Use as you see fit, easy to convert into bash scripts and alike.

Just note that if you need to run the above scripts multiple times, in case of multiple s3 targets, clear the local working folder before copying new objects into it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM