Run multiple AWS sync command in parallel to copy data from multiple source s3 to multiple destination s3

Question

I want to copy data from A bucket to M bucket, B bucket to N bucket and so on. To speed up the process I am running the sync command in parallel like this

source_buckets=(A B C D E F)
destination_buckets=(M N O P Q R)
parallel -j 6 "aws s3 sync s3://{1}/$release_version/ s3://{2}/" ::: "${source_buckets[@]}" ::: "${destination_buckets[@]}"

but with this command data from all buckets (A,B,C,D,E,F) is going to each bucket of destination. Can someone please help me to fix it?

Also, any alternative way to speed up the cp/sync process?

Answer 1

You are using gnu parallel here which is probably trying to do what it is intended to, but not what you want.

I did a quick test and confirmed this without aws cli -

parallel echo ::: a b c ::: d e f

is doing the same thing -

a d
a e
a f
b d
b e
b f
c d
c e
c f

So there is --link option to match arrays:

parallel --link echo ::: a b c ::: d e f

a d
b e
c f

Regarding alternative ways - boto3 is the library behind AWS CLI, and you can use all power of python to do all kinds of things with it - see eg Sync two buckets through boto3

Run multiple AWS sync command in parallel to copy data from multiple source s3 to multiple destination s3

Question

1 answers

solution1
1 2023-01-30 00:24:46

Run multiple AWS sync command in parallel to copy data from multiple source s3 to multiple destination s3

Question

1 answers

solution1 1 2023-01-30 00:24:46

solution1
1 2023-01-30 00:24:46