简体   繁体   中英

AWS CLI S3 sync only over selected files?

I need synchronize two AWS S3 buckets, but I need sync only the files in a list. This is the scenario:

BucketA:

File1.jpg Deleted  
File2.jpg Modified
File3.jpg Deleted
File4.jpg Modified
File5.jpg Modified
File6.jpg New

BucketB:

File1.jpg 
File2.jpg 
File3.jpg 
File4.jpg 
File5.jpg 

I'm looking for a command like this:

aws s3 sync s3://BucketA s3://BucketB --delete --exclude "*" --include "File1.jpg;File2.jpg;File4.jpg"

The result BucketB must be like this:

File1.jpg deleted
File2.jpg Modified
File3.jpg No changed
File4.jpg Modified
File5.jpg No changed

Any idea?

It looks like this is achievable, except for the deletion part.

This command will sync only the specified files:

aws s3 sync s3://bucketA s3://bucketB --exclude "*" --include "File1.jpg" --include "File2.jpg" --include "File4.jpg"

However, the --delete parameter seems to only look at the files in BucketA that are included in the --include parameter, causing all other files to 'invisible' and therefore deleted from BucketB.

This command:

aws s3 sync s3://bucketA s3://bucketB --delete --exclude "*" --include "File1.jpg" --include "File2.jpg" --include "File4.jpg"

actually deletes all files except File2.jpg and File4.jpg . So, it doesn't look like you can do a selective delete in the expected manner.

Here's a script to test all of the above:

aws s3 cp foo s3://bucketa/File1.jpg
aws s3 cp foo s3://bucketa/File2.jpg
aws s3 cp foo s3://bucketa/File3.jpg
aws s3 cp foo s3://bucketa/File4.jpg
aws s3 cp foo s3://bucketa/File5.jpg
aws s3 sync s3://bucketa s3://bucketb
aws s3 rm s3://bucketa/File1.jpg
aws s3 rm s3://bucketa/File3.jpg
aws s3 cp foo s3://bucketa/File6.jpg
aws s3 cp bar s3://bucketa/File2.jpg
aws s3 cp bar s3://bucketa/File4.jpg
aws s3 cp bar s3://bucketa/File5.jpg

aws s3 ls s3://bucketa
2015-07-23 08:50:44         49 File2.jpg
2015-07-23 08:50:49         49 File4.jpg
2015-07-23 08:50:53         49 File5.jpg
2015-07-23 08:50:20         24 File6.jpg

aws s3 ls s3://bucketb
2015-07-23 08:49:35         24 File1.jpg
2015-07-23 08:49:35         24 File2.jpg
2015-07-23 08:49:36         24 File3.jpg
2015-07-23 08:49:36         24 File4.jpg
2015-07-23 08:49:36         24 File5.jpg 

aws s3 sync s3://bucketa s3://bucketb --exclude "*" --include "File1.jpg" --include "File2.jpg" --include "File4.jpg"

There is no way to sync specific files, but acutally a few bad workarounds.

  1. As @John Rotenstein mentioned, you could use --exclude="*" --indclude="FILEPATH"

    • this solution will work for one or two files
    • if you wanna sync more files it will last longer than simply syncing all files
    • the reason why it will take longer is, that the 'sync' command will recursivly iterate over all files of your target directory
    • for each argument passed with an --include, there will be one iteration overall files
    • sync will check whether the pattern which is passed via --include, is matching a file path
    • you could also pass wildcards instead of paths via --include
    • by using --debug option you could verify this yourself
  2. You could use 'cp' command instead of 'sync' and append your file path to the path of your target directory

    • this method does not check whether a sync is needed, it will just copy the file
    • for each file path the whole copy command is executed, which is very time consuming

So for at least a few files you could use these workarounds, but if there are several hundred files as in my case, there is no way.

If you want to push this, I've already opened a ticket at github: https://github.com/aws/aws-cli/issues/5167

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM