简体   繁体   English

如何处理Web Service中的竞争条件?

[英]How to handle race conditions in Web Service?

I implemented a Web Service with Java Servlets. 我用Java Servlets实现了一个Web服务。

I got the following setup: There is a database which handles 'job'-entries. 我得到了以下设置:有一个处理'job'-entries的数据库。 Each job has a status like 'executing' or 'in queue' or 'finished'. 每个作业都具有“正在执行”或“在队列中”或“已完成”的状态。 If a user starts a new job, there is made an entry in the database with a job and the status 'in queue'. 如果用户启动新作业,则会在数据库中创建一个带有作业且状态为“队列”的条目。

The job should only be executed if less than five other jobs are already executed. 只有在已执行少于五个其他作业的情况下才能执行作业。 If there are five others already executing the status needs to stay 'in queue' and a Cronjob will handle the execution of this job later. 如果还有其他五个已经执行状态需要保持“队列”并且Cronjob将在稍后处理该作业的执行。

Now I just wonder, that if there are less than five jobs executing at the moment, my Script will execute this job. 现在我只是想知道,如果目前执行的工作少于五个,我的脚本将执行这项工作。 But what if at the same time, between my script asking the database how many jobs are being executed and the script starting to execute the job, another request from another user creates a job and also gets 'four executing jobs' as a result from the database. 但是,如果同时在我的脚本询问数据库正在执行多少作业和开始执行作业的脚本之间,另一个用户的另一个请求创建一个作业,并且因此得到“四个执行作业”数据库。

Then there would be a race condition and 6 jobs would be executed. 然后会出现竞争条件,并且将执行6个工作。

How can I prevent something like that? 我怎么能防止这样的事情? Any advice? 有什么建议? Thank you very very much! 非常非常感谢你!

If I understand correctly and you have control over the application layer that makes the requests to the DB you could use Semaphores to control who is accessing the DB. 如果我理解正确并且您可以控制向DB发出请求的应用程序层,则可以使用信号量控制谁正在访问数据库。

Semaphores, in a way, are like traffic lights. 在某种程度上,信号量就像交通信号灯。 They give access to the critical code for only N threads. 它们只能访问N个线程的关键代码。 So, you could set N to 5, and allow only the threads in the critical code change their status to executing etc.. 因此,您可以将N设置为5,并且只允许关键代码中的线程将其状态更改为executing等。

Here is a nice tutorial about using them. 是一个关于使用它们的好教程。

You can use record locking to control concurrency. 您可以使用记录锁定来控制并发。 One way to do it is by executing "select for update" query. 一种方法是执行“select for update”查询。

Your application must have other table that store worker_count. 您的应用程序必须具有存储worker_count的其他表。 And then your servlet must do as following: 然后你的servlet必须做如下:

  1. Get the database connection 获取数据库连接

  2. Turn off auto commit 关闭自动提交

  3. Insert the job with 'IN QUEUE' status 以“IN QUEUE”状态插入作业

  4. Execute "select worker_cnt from ... for update" query. 执行“从...中选择worker_cnt进行更新”查询。

(at this point other users that execute the same query will have to wait until we commit) (此时执行相同查询的其他用户必须等到我们提交)

  1. Read worker_cnt value 读取worker_cnt值

  2. If worker_cnt >= 5 commit and quit. 如果worker_cnt> = 5则提交并退出。

(at this point you get the ticket to execute the job, but other users still waiting) (此时您获得执行作业的票证,但其他用户仍在等待)

  1. Update the job to 'EXECUTING' 将作业更新为“EXECUTING”

  2. Increment worker_cnt 增加worker_cnt

  3. commit. 承诺。

(at this point other users can continue their query and will get updated worker_cnt) (此时其他用户可以继续查询并获得更新的worker_cnt)

  1. do execute the job 做执行工作

  2. Update the job to 'FINISHED' 将工作更新为'FINISHED'

  3. Decrement worker_cnt 减少worker_cnt

  4. commit again 再次提交

  5. close the database connection 关闭数据库连接

Edit : I understand your question now. 编辑:我现在明白你的问题。 I do another response :) 我做了另一个回复:)

Yes, you could have race conditions. 是的,你可能有竞争条件。 You could use a database lock to handle them. 您可以使用数据库锁来处理它们。 If the record is not often accessed in a concurrent way, look at the pessimistic lock . 如果通常不以并发方式访问记录,请查看悲观锁。 If the record is often accessed in a concurrent way, look at the the optimistic lock. 如果通常以并发方式访问记录,请查看乐观锁。

Guy Grin is right, what you are calling for is a mutual exclusion situation that can be solved with semaphores . 盖伊格林是对的,你要求的是一个可以用信号量解决的互斥情况。 This construct by Dijkstra should solve your problem. Dijkstra的这个构造应该可以解决您的问题。

This construct is usually intended for usage with code, that can only be executed by only one process at a time. 此构造通常用于代码,一次只能由一个进程执行。 Example situations are exactly what you seem to be facing; 示例情况正是您面临的情况; eg database transactions that need to make sure you do not run into lost updates or dirty reads. 例如,需要确保您不会遇到丢失更新或脏读的数据库事务。 Why exactly is it that you want 5 simultaneous executions? 为什么要同时执行5次? Are you sure you do not run into exactly those problems when you allow simultaneous execution at all? 当您允许同时执行时,您确定不会遇到这些问题吗?

The basic idea is to have a so called critical section in your code that has to be protected from race conditions resp. 基本思想是在代码中有一个所谓的关键部分,必须保护其免受竞争条件的影响。 needs mutual exclusion handling. 需要互斥处理。 This part of your code is marked critical and before its execution tells other parties that also want to call it to wait() . 代码的这一部分被标记为关键,并且在执行之前告诉其他方也要将其调用为wait() As soon as it is done doing its magic it calls notify() and an internal handler now allows the next process in line to execute the critical section. 一旦完成它的魔术,它就会调用notify() ,现在内部处理程序允许下一个进程在线执行临界区。

But: 但:

  • I highly recommend not to implement ANY mutual exclusion handling approach by yourself. 我强烈建议您不要自己实施任何互斥处理方法。 In a theoretical computer science class some years ago we analyzed these constructions on OS level and proved what can go wrong. 在几年前的理论计算机科学课上,我们在操作系统级别上分析了这些结构并证明了可能出错的地方。 Even if it looks simple at a first glance there is more to it than meets the eye and depending on the language it is really hard to get it right if you do it yourself. 乍一看它看起来很简单,除了眼睛之外还有更多的东西,根据语言,如果你自己做的话,很难做到正确。 Especially in Java and related languages where you have no control over what the underlying VM is doing. 特别是在Java和相关语言中,您无法控制底层VM正在执行的操作。 Instead there are preimplemented out-of-the-box solutions that are already tested and proven correct. 相反,有预先实现的开箱即用解决方案已经过测试并证明是正确的。

  • Before handling mutual exclusion in a productive environment read a bit about it and be sure to understand what it implies. 在生产环境中处理互斥之前,请先阅读一下,并确保理解它的含义。 Eg there is The Little Book of Semaphores which is a well written and nice to read reference. 例如,有一本信息量小书,这是一本写得很好,很好阅读的参考书。 At least have a glance at it. 至少瞥了一眼。

I am not quite sure about Java Servlets but Java does have an out-of-the-box solution for mutual exclusions in a keyword called synchronized to mark critical sections in your code that are not allowed to be executed simultaneously by several processes. 我不太确定Java Servlets,但Java确实有一个开箱即用的解决方案,用于在一个名为synchronized的关键字中进行互斥,以标记代码中不允许由多个进程同时执行的关键部分。 There will be no need for external libraries. 不需要外部库。

A nice sample code is provided in this earlier post on SO. 一个很好的示例代码中提供这种早期的岗位上SO。 Although it is already stated there let me remind you to really use notifyAll() if you handle several producers / consumers otherwise weird things will happen and wild processes spinning in starvation will come and kill your cat. 虽然已经说明了,但是让我提醒你真的使用notifyAll()如果你处理几个生产者/消费者,否则会发生奇怪的事情,并且在饥饿中旋转的野生过程将会杀死你的猫。

Another bigger tutorial on the topic can be found here . 关于该主题的另一个更大的教程可以在这里找到。

As other people have responded, this situation calls for a Semaphore or Mutex. 正如其他人的回应,这种情况需要信号量或互斥量。 The one area where I think you may want to be careful is, where does the authoritative Mutex lives. 我认为你可能要小心的一个领域是,权威的互斥体在哪里生活。 Depending on the situation, you could have several different optimal solutions (trading-off security versus performance/complexity): 根据具体情况,您可以有几种不同的最佳解决方案(权衡安全性与性能/复杂性):

a) If you will have only one Server (non-clustered), and the only use case for modifying the Database is through your Servlet, then you could implement a static in-memory Mutex (some common object that you can synchronize access against). a)如果你只有一个服务器(非集群),并且修改数据库的唯一用例是通过你的Servlet,那么你可以实现一个静态的内存互斥(一些你可以同步访问的常见对象) 。 This will have the least impact in performance, and would be the easiest to maintain (because all the relevant code is in your project). 这对性能影响最小,并且最容易维护(因为所有相关代码都在您的项目中)。 Also, it doesn't depend on the idiosyncrasies of the specific Database you are using. 此外,它不依赖于您正在使用的特定数据库的特性。 It also allows you to lock access to non-database objects. 它还允许您锁定对非数据库对象的访问。

b) If you will have several separate Servers, but they are all instances of of your code, you could implement a Synchronization Service, that allows the specific instance to obtain the lock (probably with a timeout), before it is allowed to update the Database. b)如果你有几个单独的服务器,但它们都是你代码的实例,你可以实现一个同步服务,它允许特定的实例在允许更新之前获得锁(可能有一个超时)。数据库。 This will be a bit more complex, but still all the logic will reside in your code, and the solution will be portable across database types. 这将更复杂,但仍然所有逻辑都将驻留在您的代码中,并且该解决方案将可跨数据库类型移植。

c) If your database can be either updated by your server or by a different back-end process (for example an ETL), then the only way is to implement record level locking in the DB. c)如果您的数据库可以由您的服务器或不同的后端进程(例如ETL)更新,那么唯一的方法是在数据库中实现记录级别锁定。 If you do this, you will be dependent on the specific type of support your database provides and will probably require changes if you happen to port to a different DB. 如果这样做,您将依赖于数据库提供的特定类型的支持,如果您碰巧移植到其他数据库,则可能需要更改。 In my opinion, this is the most-complex, least maintainable option, and it should only be taken if the conditions for c) are unambiguously true. 在我看来,这是最复杂,最不易维护的选择,只有在c)的条件明确无误时才应该采用。

The answer is implicit in your question: your requests have to be enqueued so build a fifo queue with producers and consumers. 答案隐含在你的问题中:你的请求必须排队,所以建立一个生产者和消费者的fifo队列。

The servlet always adds jobs in the queue (optionally check if it's full), and 5 other threads will extract one job a time or sleep if the queue is empty. servlet总是在队列中添加作业(可选择检查它是否已满),另外5个线程将​​一次提取一个作业或在队列为空时休眠。

There's no need to use cron or mutex for this, just remember to synchronize the queue or the consumers may extract the same job twice. 没有必要为此使用cron或mutex,只需记住同步队列或消费者可以两次提取相同的作业。

In my opinion even if you don't use ExecutorService, it will be easiest to achieve your logic if you always update the database and start your jobs from a Single thread. 在我看来,即使你不使用ExecutorService,如果你总是更新数据库并从单线程开始你的工作,最容易实现你的逻辑。 You can arrange the execution of your Jobs in a Queue and have one thread to execute the and update the database status to the correct form. 您可以在队列中安排作业的执行,并有一个线程来执行并将数据库状态更新为正确的表单。

If you want to control the number of Jobs executing. 如果要控制执行的作业数。 One way to do this is to use ExecutorsService with FixedThreadPool of 5. This way you will know for sure that only 5 jobs will be executing at once and no more... All other jobs will be queued in within the ExecutorService. 一种方法是使用ExecutorsService和FixedThreadPool为5.这样您就可以确定一次只执行5个作业而不再执行...所有其他作业将在ExecutorService中排队。

Some of my colleagues will point you to low level concurrency APIs. 我的一些同事会指出低级并发API。 I believe that these are not meant for fixing general programming issues. 我认为这些不是用于修复一般编程问题。 Whatever you decide to do Try to use a higher level API and don't dig in into the details. 无论你决定做什么尝试使用更高级别的API,而不是深入细节。 Most of the low level stuff is already implemented within the existing frameworks and I doubt you will do it better. 大多数低级别的东西已经在现有框架中实现,我怀疑你会做得更好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM