简体   繁体   中英

Run multiple AWS sync command in parallel to copy data from multiple source s3 to multiple destination s3

I want to copy data from A bucket to M bucket, B bucket to N bucket and so on. To speed up the process I am running the sync command in parallel like this

source_buckets=(A B C D E F)
destination_buckets=(M N O P Q R)
parallel -j 6 "aws s3 sync s3://{1}/$release_version/ s3://{2}/" ::: "${source_buckets[@]}" ::: "${destination_buckets[@]}"

but with this command data from all buckets (A,B,C,D,E,F) is going to each bucket of destination. Can someone please help me to fix it?

Also, any alternative way to speed up the cp/sync process?

You are using gnu parallel here which is probably trying to do what it is intended to, but not what you want.

I did a quick test and confirmed this without aws cli -

parallel echo ::: a b c ::: d e f

is doing the same thing -

a d
a e
a f
b d
b e
b f
c d
c e
c f

So there is --link option to match arrays:

parallel --link echo ::: a b c ::: d e f

a d
b e
c f

Regarding alternative ways - boto3 is the library behind AWS CLI, and you can use all power of python to do all kinds of things with it - see eg Sync two buckets through boto3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM