AWS S3 客户端竞争条件解决方案

Question

The problem my team and I have been trying to solve involves multiple ec2 instances each with their own independent, parallel access to the same S3 bucket.我和我的团队一直试图解决的问题涉及多个 ec2 实例，每个实例都有自己独立的并行访问同一个 S3 存储桶的权限。 The issue arises as a race condition when each client is attempting to download the same file within the aforementioned s3 bucket.当每个客户端都尝试在上述 s3 存储桶中下载相同的文件时，此问题会作为竞争条件出现。 Each client is attempting to read the file, run some business logic and then delete the file.每个客户端都试图读取文件，运行一些业务逻辑，然后删除文件。 Since there are many opportunities for delay, the race condition occurs and multiple instances end up running the business logic.由于存在许多延迟机会，因此会出现竞争条件，并且多个实例最终会运行业务逻辑。

Some advice would be greatly appreciated on how engineers have been implementing locking mechanisms with their s3 clients.关于工程师如何使用他们的 s3 客户端实现锁定机制的一些建议将不胜感激。

Our brainstormed approach: Upload a .lock file to the s3 bucket with information regarding which instance currently holds the lock.我们集思广益的方法：将 .lock 文件上传到 s3 存储桶，其中包含有关当前持有锁的实例的信息。 When the instance that holds the lock finishes the process, it then deletes its lock.当持有锁的实例完成进程时，它会删除它的锁。 (issues arise when the lock file is being uploaded - race condition with the locking mechanism). （上载锁定文件时会出现问题 - 锁定机制的竞争条件）。

Answer 1

hmmm... you're going to have a race condition with the lock file now... multiple nodes are going to upload the same lock file!嗯……您现在将与锁定文件发生竞争状况……多个节点将上传相同的锁定文件！

So you'll need something a little more sophisticated as S3 does not have any concurrency built in and this can be quite inconvenient.所以你需要一些更复杂的东西，因为 S3 没有内置任何并发性，这可能很不方便。

The obvious way to deal with this is to use SQS (simple queue service) - this is built for concurrency.解决这个问题的明显方法是使用 SQS（简单队列服务）——这是为并发而构建的。

So in your case, all of the nodes connect to the same queue waiting for work from the queue.因此，在您的情况下，所有节点都连接到同一个队列，等待队列中的工作。 Something or other will add elements to the queue for each file in s3 that needs to be processed.某些东西或其他东西会将元素添加到 s3 中需要处理的每个文件的队列中。 One of the nodes will pick up the entry in the queue, process the file, delete the file and delete the entry in the queue.其中一个节点将拾取队列中的条目，处理文件，删除文件并删除队列中的条目。

That way you don't get multi processing and you get elegant scaling etc.这样你就不会得到多重处理，你会得到优雅的缩放等。

The outstanding issue however is what is scanning s3 in the first place to put work on the queue.然而，突出的问题是首先扫描 s3 以将工作放入队列中。 This is probably where your difficulty will arise.这可能是您遇到困难的地方。

I think you have two options:我认为你有两个选择：

Use a lambda.使用 lambda。 This is rather elegant.这是相当优雅的。 You can configure a lambda to fire when something gets added to S3.您可以将 lambda 配置为在将某些内容添加到 S3 时触发。 This lambda will then register a pointer to the file on the queue to be picked up for the ec2 instances to process.然后，此 lambda 将注册一个指向队列中文件的指针，以供 ec2 实例处理。
Problem with the lambda is your application is a little more distributed. lambda 的问题是您的应用程序更加分散。 ie you can't just look in the code for the behaviour, you've got to look in lambda as well.即，您不能只查看代码的行为，还必须查看 lambda。 Though I guess this lambda is not particularly heavyweight.虽然我猜这个 lambda 并不是特别重量级的。
Let all the ec2 instances monitor s3 but when they find work to do they'll add the work to the FIFO queue.让所有 ec2 实例监控 s3，但是当他们找到工作要做时，他们会将工作添加到 FIFO 队列中。 This is a relatively new queue type from AWS where you have guaranteed order and you have exactly once processing.这是来自 AWS 的一种相对较新的队列类型，您可以在其中保证订单并且只进行一次处理。 Thus you can guarantee that even though multiple nodes found the same s3 file, only one node will process it.因此，您可以保证即使多个节点找到相同的 s3 文件，也只有一个节点会处理它。

Answer 2

If it is possible with your current setup, and application, I would think of configuring events to the S3 bucket, to send a message to a SQS queue (when a file is uploaded for instance) and then use an ElasticBeanstalk Worker environment to consume the messages from the Queue in order to process those files according to your application.如果您当前的设置和应用程序可行，我会考虑将事件配置到 S3 存储桶，将消息发送到 SQS 队列（例如，当文件上传时），然后使用 ElasticBeanstalk Worker 环境来使用来自队列的消息，以便根据您的应用程序处理这些文件。

Worker Environments Docs 工作环境文档

Answer 3

If you don't want to use AWS specific tech (eg SQS or lambdas), you have 2 options:如果您不想使用 AWS 特定技术（例如 SQS 或 lambdas），您有 2 个选项：

Existing Database现有数据库

If you have an existing database you can leverage, you can use advisory locks (.eg what Postgres offers ) as follows: When a process wants to work on the files:如果您有一个可以利用的现有数据库，您可以使用咨询锁（例如 Postgres 提供的），如下所示：当一个进程想要处理文件时：

it first checks if the lock is available.它首先检查锁是否可用。 If not, it will have to wait on the lock.如果没有，它将不得不等待锁定。
once it acquires the lock, it can do the work it needs, including deleting the file.一旦它获得锁，它就可以做它需要的工作，包括删除文件。
it finally releases the lock.它最终释放了锁。

Conceptually, this is very similar to your .lock file setup you mention.从概念上讲，这与您提到的.lock文件设置非常相似。

Use external services使用外部服务

Something like lockable .类似lockable的东西。 If you're using Python, you can use their Python client :如果你使用 Python，你可以使用他们的Python 客户端：

$ pip install lockable-dev

from lockable import Lock

with Lock('my-lock-name'):
    #do stuff

If you're not using Python, you can still use their HTTP endpoints;如果你不使用 Python，你仍然可以使用他们的 HTTP 端点； something like就像是

curl https://api.lockable.dev/v1/acquire/my-s3-file-lock
Work on the file处理文件
curl https://api.lockable.dev/v1/release/my-s3-file-lock

Answer 4

I would try to move the file to a staging bucket.我会尝试将文件移动到暂存桶。 Only one process will success, other will fail.只有一个过程会成功，其他过程会失败。 The one that success take the job.成功的拿下工作。

AWS S3 客户端竞争条件解决方案

问题描述

4 个解决方案

解决方案1
4 已采纳 2017-08-21 19:42:25

解决方案2
1 2017-08-21 19:17:35

解决方案3
1 2022-06-15 14:24:34

Existing Database现有数据库

Use external services使用外部服务

解决方案4
0 2018-10-02 16:54:01

AWS S3 客户端竞争条件解决方案

问题描述

4 个解决方案

解决方案1 4 已采纳 2017-08-21 19:42:25

解决方案2 1 2017-08-21 19:17:35

解决方案3 1 2022-06-15 14:24:34

Existing Database现有数据库

Use external services使用外部服务

解决方案4 0 2018-10-02 16:54:01

解决方案1
4 已采纳 2017-08-21 19:42:25

解决方案2
1 2017-08-21 19:17:35

解决方案3
1 2022-06-15 14:24:34

解决方案4
0 2018-10-02 16:54:01