简体   繁体   中英

What will happen if I paginate over s3 bucket files and put new files concurrently

I have lots of files in a "source" S3 bucket and I want to copy them to "dest" bucket. But at the same time new files are put into the "source" bucket. The question is, will the paginator see new uploaded files? And if not how could I track them for paginating again is costly?

My code (using aws-sdk-go-v2):

paginator := s3.NewListObjectsV2Paginator(client, &s3.ListObjectsV2Input{
    Bucket: bucket,
})

for paginator.HasMorePages() {
    page, err := paginator.NextPage(ctx)
    if err != nil {
        log.Errorf("error: %+v", err)
        return
    }
    for _, obj := range page.Contents {
        // copy object
    }
}

As mentioned in the comments you should definitely test your code but...

it's worth mentioning that since December 2020 S3 is Strong Read-After-Write Consistency .

In other words,

all S3 GET, PUT, and LIST operations, as well as operations that change object tags, ACLs, or metadata, are now strongly consistent. What you write is what you will read, and the results of a LIST will be an accurate reflection of what's in the bucket. This applies to all existing and new S3 objects, works in all regions [...]

More on that in this blog post .

And according to the AWS GO SDK

Pagination methods iterate over a list operation until the method retrieves the last page of results or until the callback function returns false

If you're simply reading the files and writing them to a second bucket, but otherwise leaving them the same content, S3 can do this for you with bucket replication .

If you need to do some processing beyond what S3's built-in replication can do, the best way is with event bridge, which will automatically handle incoming objects. For resiliency, I recommend connecting the event bridge to SQS and the SQS to a lambda function. You can then run an idempotent version of your syncing program to pick up any objects that came in before the event bridge was set up. Find some way to know whether the object was replicated or not (object tags might help here), and you can run it until all objects are synced.

Either way, new objects will be synced in the future.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM