简体   繁体   中英

AWS S3 client race condition solutions

The problem my team and I have been trying to solve involves multiple ec2 instances each with their own independent, parallel access to the same S3 bucket. The issue arises as a race condition when each client is attempting to download the same file within the aforementioned s3 bucket. Each client is attempting to read the file, run some business logic and then delete the file. Since there are many opportunities for delay, the race condition occurs and multiple instances end up running the business logic.

Some advice would be greatly appreciated on how engineers have been implementing locking mechanisms with their s3 clients.

Our brainstormed approach: Upload a .lock file to the s3 bucket with information regarding which instance currently holds the lock. When the instance that holds the lock finishes the process, it then deletes its lock. (issues arise when the lock file is being uploaded - race condition with the locking mechanism).

hmmm... you're going to have a race condition with the lock file now... multiple nodes are going to upload the same lock file!

So you'll need something a little more sophisticated as S3 does not have any concurrency built in and this can be quite inconvenient.

The obvious way to deal with this is to use SQS (simple queue service) - this is built for concurrency.

So in your case, all of the nodes connect to the same queue waiting for work from the queue. Something or other will add elements to the queue for each file in s3 that needs to be processed. One of the nodes will pick up the entry in the queue, process the file, delete the file and delete the entry in the queue.

That way you don't get multi processing and you get elegant scaling etc.

The outstanding issue however is what is scanning s3 in the first place to put work on the queue. This is probably where your difficulty will arise.

I think you have two options:

  1. Use a lambda. This is rather elegant. You can configure a lambda to fire when something gets added to S3. This lambda will then register a pointer to the file on the queue to be picked up for the ec2 instances to process.

    Problem with the lambda is your application is a little more distributed. ie you can't just look in the code for the behaviour, you've got to look in lambda as well. Though I guess this lambda is not particularly heavyweight.

  2. Let all the ec2 instances monitor s3 but when they find work to do they'll add the work to the FIFO queue. This is a relatively new queue type from AWS where you have guaranteed order and you have exactly once processing. Thus you can guarantee that even though multiple nodes found the same s3 file, only one node will process it.

If it is possible with your current setup, and application, I would think of configuring events to the S3 bucket, to send a message to a SQS queue (when a file is uploaded for instance) and then use an ElasticBeanstalk Worker environment to consume the messages from the Queue in order to process those files according to your application.

Worker Environments Docs

If you don't want to use AWS specific tech (eg SQS or lambdas), you have 2 options:

Existing Database

If you have an existing database you can leverage, you can use advisory locks (.eg what Postgres offers ) as follows: When a process wants to work on the files:

  1. it first checks if the lock is available. If not, it will have to wait on the lock.
  2. once it acquires the lock, it can do the work it needs, including deleting the file.
  3. it finally releases the lock.

Conceptually, this is very similar to your .lock file setup you mention.

Use external services

Something like lockable . If you're using Python, you can use their Python client :

$ pip install lockable-dev

from lockable import Lock

with Lock('my-lock-name'):
    #do stuff

If you're not using Python, you can still use their HTTP endpoints; something like

  1. curl https://api.lockable.dev/v1/acquire/my-s3-file-lock
  2. Work on the file
  3. curl https://api.lockable.dev/v1/release/my-s3-file-lock

I would try to move the file to a staging bucket. Only one process will success, other will fail. The one that success take the job.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM