为什么 Python threading.Condition() notify() 需要锁？

Question

My question refers specifically to why it was designed that way, due to the unnecessary performance implication.我的问题特别指的是为什么它是这样设计的，由于不必要的性能影响。

When thread T1 has this code:当线程 T1 具有此代码时：

cv.acquire()
cv.wait()
cv.release()

and thread T2 has this code:和线程 T2 有这个代码：

cv.acquire()
cv.notify()  # requires that lock be held
cv.release()

what happens is that T1 waits and releases the lock, then T2 acquires it, notifies cv which wakes up T1.发生的事情是 T1 等待并释放锁，然后 T2 获取它，通知cv唤醒 T1。 Now, there is a race-condition between T2's release and T1's reacquiring after returning from wait() .现在，在从wait()返回后 T2 的释放和 T1 的重新获取之间存在竞争条件。 If T1 tries to reacquire first, it will be unnecessarily resuspended until T2's release() is completed.如果 T1 首先尝试重新获取，它将被不必要地重新挂起，直到 T2 的release()完成。

Note: I'm intentionally not using the with statement, to better illustrate the race with explicit calls.注意：我故意不使用with语句，以更好地说明显式调用的竞争。

This seems like a design flaw.这似乎是一个设计缺陷。 Is there any rationale known for this, or am I missing something?是否有任何已知的理由，或者我错过了什么？

Answer 1

This is not a definitive answer, but it's supposed to cover the relevant details I've managed to gather about this problem.这不是一个明确的答案，但它应该涵盖我设法收集的有关此问题的相关详细信息。

First, Python's threading implementation is based on Java's .首先，Python 的线程实现是基于 Java 的. Java's Condition.signal() documentation reads: Java 的Condition.signal()文档内容如下：

An implementation may (and typically does) require that the current thread hold the lock associated with this Condition when this method is called.当调用此方法时，实现可能（并且通常确实）要求当前线程持有与此 Condition 关联的锁。

Now, the question was why enforce this behavior in Python in particular.现在，问题是为什么要特别在 Python 中强制执行这种行为。 But first I want to cover the pros and cons of each approach.但首先我想介绍每种方法的优缺点。

As to why some think it's often a better idea to hold the lock, I found two main arguments:至于为什么有些人认为持有锁通常是一个更好的主意，我发现了两个主要论点：

From the minute a waiter acquire() s the lock—that is, before releasing it on wait() —it is guaranteed to be notified of signals.从wait() acquire()锁定的那一刻开始 - 即在wait()释放它之前 - 保证会收到信号通知。 If the corresponding release() happened prior to signalling, this would allow the sequence(where P=Producer and C=Consumer ) P: release(); C: acquire(); P: notify(); C: wait()如果相应的release()在发信号之前发生，这将允许序列（其中P=Producer和C=Consumer ） P: release(); C: acquire(); P: notify(); C: wait() P: release(); C: acquire(); P: notify(); C: wait() P: release(); C: acquire(); P: notify(); C: wait() in which case the wait() corresponding to the acquire() of the same flow would miss the signal. P: release(); C: acquire(); P: notify(); C: wait()在这种情况下，与同一流的acquire()对应的wait()将错过信号。 There are cases where this doesn't matter (and could even be considered to be more accurate), but there are cases where that's undesirable.有些情况下这无关紧要（甚至可以被认为更准确），但有些情况下这是不可取的。 This is one argument.这是一种说法。
When you notify() outside a lock, this may cause a scheduling priority inversion;当你在锁外notify() ，这可能会导致调度优先级倒置； that is, a low-priority thread might end up taking priority over a high-priority thread.也就是说，低优先级线程最终可能会优先于高优先级线程。 Consider a work queue with one producer and two consumers ( LC=Low-priority consumer and HC=High-priority consumer ), where LC is currently executing a work item and HC is blocked in wait() .考虑一个有一个生产者和两个消费者（ LC=Low-priority consumer和HC=High-priority consumer ）的工作队列，其中LC当前正在执行一个工作项，而HC在wait()被阻塞。

The following sequence may occur:可能会出现以下顺序：

P                    LC                    HC
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                     execute(item)                   (in wait())
lock()                                  
wq.push(item)
release()
                     acquire()
                     item = wq.pop()
                     release();
notify()
                                                     (wake-up)
                                                     while (wq.empty())
                                                       wait();

Whereas if the notify() happened before release() , LC wouldn't have been able to acquire() before HC had been woken-up.而如果notify()发生在release()之前， LC将无法在HC被唤醒之前acquire() 。 This is where the priority inversion occurred.这就是发生优先级反转的地方。 This is the second argument.这是第二个论点。

The argument in favor of notifying outside of the lock is for high-performance threading, where a thread need not go back to sleep just to wake-up again the very next time-slice it gets—which was already explained how it might happen in my question.支持在锁外进行通知的论点是针对高性能线程，在这种情况下，线程不需要回到睡眠状态，只是为了在它获得的下一个时间片再次唤醒——这已经解释了它是如何发生的我的问题。

Python's `threading` Module Python 的`threading`模块

In Python, as I said, you must hold the lock while notifying.在 Python 中，正如我所说，您必须在通知时持有锁。 The irony is that the internal implementation does not allow the underlying OS to avoid priority inversion, because it enforces a FIFO order on the waiters.具有讽刺意味的是，内部实现不允许底层操作系统避免优先级反转，因为它对等待程序强制执行 FIFO 顺序。 Of course, the fact that the order of waiters is deterministic could come in handy, but the question remains why enforce such a thing when it could be argued that it would be more precise to differentiate between the lock and the condition variable, for that in some flows that require optimized concurrency and minimal blocking, acquire() should not by itself register a preceding waiting state, but only the wait() call itself.当然，服务员的顺序是确定性的这一事实可能会派上用场，但问题仍然是为什么要强制执行这样的事情，因为有人会争辩说区分锁和条件变量会更精确，因为在一些需要优化并发和最小阻塞的流， acquire()不应该自己注册前面的等待状态，而应该只注册wait()调用本身。

Arguably, Python programmers would not care about performance to this extent anyway—although that still doesn't answer the question of why, when implementing a standard library, one should not allow several standard behaviors to be possible.可以说，Python 程序员无论如何都不会关心性能到这种程度——尽管这仍然没有回答为什么在实现标准库时，一个人不应该允许多个标准行为成为可能的问题。

One thing which remains to be said is that the developers of the threading module might have specifically wanted a FIFO order for some reason, and found that this was somehow the best way of achieving it, and wanted to establish that as a Condition at the expense of the other (probably more prevalent) approaches.还有一件事要说的是， threading模块的开发人员可能出于某种原因特别想要一个 FIFO 顺序，并发现这是实现它的最佳方式，并希望以牺牲为代价将其建立为Condition其他（可能更普遍）的方法。 For this, they deserve the benefit of the doubt until they might account for it themselves.为此，他们值得怀疑，直到他们自己解释。

Answer 2

There are several reasons which are compelling (when taken together).有几个原因令人信服（综合考虑）。

1. The notifier needs to take a lock 1.通知者需要拿锁

Pretend that Condition.notifyUnlocked() exists.假设Condition.notifyUnlocked()存在。

The standard producer/consumer arrangement requires taking locks on both sides:标准的生产者/消费者安排需要双方锁定：

def unlocked(qu,cv):  # qu is a thread-safe queue
  qu.push(make_stuff())
  cv.notifyUnlocked()
def consume(qu,cv):
  with cv:
    while True:       # vs. other consumers or spurious wakeups
      if qu: break
      cv.wait()
    x=qu.pop()
  use_stuff(x)

This fails because both the push() and the notifyUnlocked() can intervene between the if qu: and the wait() .这会失败，因为push()和notifyUnlocked()都可以在if qu:和wait()之间进行干预。

Writing either of写其中之一

def lockedNotify(qu,cv):
  qu.push(make_stuff())
  with cv: cv.notify()
def lockedPush(qu,cv):
  x=make_stuff()      # don't hold the lock here
  with cv: qu.push(x)
  cv.notifyUnlocked()

works (which is an interesting exercise to demonstrate).作品（这是一个有趣的练习来演示）。 The second form has the advantage of removing the requirement that qu be thread-safe, but it costs no more locks to take it around the call to notify() as well .第二种形式具有移除要求的优势qu是线程安全的，但它的成本没有更多的锁把它随时待命，以notify()为好。

It remains to explain the preference for doing so, especially given that (as you observed) CPython does wake up the notified thread to have it switch to waiting on the mutex (rather than simply moving it to that wait queue ).仍然需要解释这样做的偏好，特别是考虑到（正如您所观察到的） CPython 确实唤醒了被通知的线程以使其切换到等待互斥锁（而不是简单地将其移动到该等待队列）。

2. The condition variable itself needs a lock 2.条件变量本身需要锁

The Condition has internal data that must be protected in case of concurrent waits/notifications. Condition具有在并发等待/通知的情况下必须受到保护的内部数据。 (Glancing at the CPython implementation , I see the possibility that two unsynchronized notify() s could erroneously target the same waiting thread, which could cause reduced throughput or even deadlock.) It could protect that data with a dedicated lock, of course; （看看CPython 的实现，我看到两个未同步的notify()可能错误地指向同一个等待线程，这可能导致吞吐量降低甚至死锁。）当然，它可以用专用锁保护数据； since we need a user-visible lock already, using that one avoids additional synchronization costs.因为我们已经需要一个用户可见的锁，使用它可以避免额外的同步成本。

3. Multiple wake conditions can need the lock 3. 多个唤醒条件可能需要锁

(Adapted from a comment on the blog post linked below.) （改编自对下面链接的博客文章的评论。）

def setSignal(box,cv):
  signal=False
  with cv:
    if not box.val:
      box.val=True
      signal=True
  if signal: cv.notifyUnlocked()
def waitFor(box,v,cv):
  v=bool(v)   # to use ==
  while True:
    with cv:
      if box.val==v: break
      cv.wait()

Suppose box.val is False and thread #1 is waiting in waitFor(box,True,cv) .假设box.val为False并且线程 #1 在waitFor(box,True,cv)等待。 Thread #2 calls setSignal ;线程#2 调用setSignal ； when it releases cv , #1 is still blocked on the condition.当它释放cv ，#1 仍然在条件下被阻塞。 Thread #3 then calls waitFor(box,False,cv) , finds that box.val is True , and waits.线程 #3 然后调用waitFor(box,False,cv) ，发现box.val是True ，然后等待。 Then #2 calls notify() , waking #3, which is still unsatisfied and blocks again.然后#2 调用notify() ，唤醒#3，它仍然不满意并再次阻塞。 Now #1 and #3 are both waiting, despite the fact that one of them must have its condition satisfied.现在#1 和#3 都在等待，尽管其中之一必须满足其条件。

def setTrue(box,cv):
  with cv:
    if not box.val:
      box.val=True
      cv.notify()

Now that situation cannot arise: either #3 arrives before the update and never waits, or it arrives during or after the update and has not yet waited, guaranteeing that the notification goes to #1, which returns from waitFor .现在不会出现这种情况：#3 在更新之前到达并且从不等待，或者它在更新期间或之后到达并且尚未等待，保证通知转到 #1，后者从waitFor返回。

4. The hardware might need a lock 4. 硬件可能需要锁

With wait morphing and no GIL (in some alternate or future implementation of Python), the memory ordering ( cf. Java's rules ) imposed by the lock-release after notify() and the lock-acquire on return from wait() might be the only guarantee of the notifying thread's updates being visible to the waiting thread.使用等待变形且没有 GIL（在 Python 的某些替代或未来实现中）， notify()之后的锁释放和从wait()返回时的锁获取强加的内存排序（参见Java 的规则wait()可能是仅保证通知线程的更新对等待线程可见。

5. Real-time systems might need it 5. 实时系统可能需要它

Immediately after the POSIX text you quoted we find :在您引用的 POSIX 文本之后，我们立即发现：

however, if predictable scheduling behavior is required, then that mutex shall be locked by the thread calling pthread_cond_broadcast() or pthread_cond_signal().但是，如果需要可预测的调度行为，则该互斥锁应由调用 pthread_cond_broadcast() 或 pthread_cond_signal() 的线程锁定。

One blog post contains further discussion of the rationale and history of this recommendation (as well as of some of the other issues here). 一篇博文进一步讨论了此建议的基本原理和历史（以及此处的其他一些问题）。

Answer 3

What happens is that T1 waits and releases the lock, then T2 acquires it, notifies cv which wakes up T1.发生的事情是 T1 等待并释放锁，然后 T2 获取它，通知 cv 唤醒 T1。

Not quite.不完全的。 The cv.notify() call does not wake the T1 thread: It only moves it to a different queue. cv.notify()调用不会唤醒T1 线程：它只会将其移动到不同的队列。 Before the notify() , T1 was waiting for the condition to be true.在notify()之前，T1 正在等待条件为真。 After the notify() , T1 is waiting to acquire the lock.在notify() ，T1 正在等待获取锁。 T2 does not release the lock, and T1 does not "wake up" until T2 explicitly calls cv.release() . T2 不会释放锁，并且 T1 不会“唤醒”，直到 T2 显式调用cv.release() 。

Answer 4

A couple of months ago exactly the same question occurred to me.几个月前，我遇到了完全相同的问题。 But since I had ipython opened, looking at threading.Condition.wait??但是因为我打开了ipython ，看着threading.Condition.wait?? result (the source for the method) didn't take long to answer it myself.结果（该方法的来源）很快就自己回答了。

In short, the wait method creates another lock called waiter, acquires it, appends it to a list and then, surprise, releases the lock on itself.简而言之， wait方法创建另一个称为 waiter 的锁，获取它，将它附加到一个列表中，然后出人意料地释放对自身的锁。 After that it acquires the waiter once again, that is it starts to wait until someone releases the waiter.之后它再次获取服务员，即它开始等待直到有人释放服务员。 Then it acquires the lock on itself again and returns.然后它再次获取自身的锁并返回。

The notify method pops a waiter from the waiter list (waiter is a lock, as we remember) and releases it allowing the corresponding wait method to continue. notify方法从waiter 列表中弹出一个waiter（waiter 是一个锁，我们记得）并释放它，允许相应的wait方法继续。

That is the trick is that the wait method is not holding the lock on the condition itself while waiting for the notify method to release the waiter.诀窍在于， wait方法在等待notify方法释放waiter 时并没有持有条件本身的锁。

UPD1 : I seem to have misunderstood the question. UPD1 ：我似乎误解了这个问题。 Is it correct that you are bothered that T1 might try to reacquire the lock on itself before the T2 release it?您是否担心 T1 可能会在 T2 释放之前尝试重新获取对自身的锁定？

But is it possible in the context of python's GIL?但是在python的GIL上下文中可能吗？ Or you think that one can insert an IO call before releasing the condition, which would allow T1 to wake up and wait forever?或者你认为可以在释放条件之前插入一个 IO 调用，这将允许 T1 唤醒并永远等待？

Answer 5

It's explained in Python 3 documentation: https://docs.python.org/3/library/threading.html#condition-objects .它在 Python 3 文档中进行了解释： https : //docs.python.org/3/library/threading.html#condition-objects 。

Note: the notify() and notify_all() methods don't release the lock;注意：notify() 和 notify_all() 方法不会释放锁； this means that the thread or threads awakened will not return from their wait() call immediately, but only when the thread that called notify() or notify_all() finally relinquishes ownership of the lock.这意味着被唤醒的线程不会立即从它们的 wait() 调用中返回，而是只有在调用 notify() 或 notify_all() 的线程最终放弃锁的所有权时才会返回。

Answer 6

There is no race condition, this is how condition variables work.没有竞争条件，这就是条件变量的工作方式。

When wait() is called, then the underlying lock is released until a notification occurs.当wait()被调用时，底层的锁被释放，直到一个通知发生。 It is guaranteed that the caller of wait will reacquire the lock before the function returns (eg, after the wait completes).可以保证wait 的调用者在函数返回之前（例如，在wait 完成之后）重新获取锁。

You're right that there could be some inefficiency if T1 was directly woken up when notify() is called.如果在调用 notify() 时直接唤醒 T1，则可能会有些低效。 However, condition variables are typically implemented via OS primitives, and the OS will often be smart enough to realize that T2 still has the lock, so it won't immediately wake up T1 but instead queue it to be woken.然而，条件变量通常是通过 OS 原语实现的，而且 OS 通常足够聪明，可以意识到 T2 仍然拥有锁，因此它不会立即唤醒 T1，而是将其排队等待唤醒。

Additionally, in python, this doesn't really matter anyways, as there's only a single thread due to the GIL, so the threads wouldn't be able to run concurrently anyways.此外，在 python 中，这无论如何都无关紧要，因为由于 GIL 只有一个线程，所以线程无论如何都无法并发运行。

Additionally, it's preferred to use the following forms instead of calling acquire/release directly:此外，最好使用以下形式而不是直接调用acquire/release：

with cv:
    cv.wait()

And:和：

with cv:
    cv.notify()

This ensures that the underlying lock is released even if an exception occurs.这确保即使发生异常也能释放底层锁。

为什么 Python threading.Condition() notify() 需要锁？

问题描述

6 个解决方案

解决方案1
7 已采纳 2017-09-13 09:40:21

Python's `threading` Module Python 的`threading`模块

解决方案2
3 2017-09-15 08:14:52

1. The notifier needs to take a lock 1.通知者需要拿锁

2. The condition variable itself needs a lock 2.条件变量本身需要锁

3. Multiple wake conditions can need the lock 3. 多个唤醒条件可能需要锁

4. The hardware might need a lock 4. 硬件可能需要锁

5. Real-time systems might need it 5. 实时系统可能需要它

解决方案3
0 2017-09-06 15:58:50

解决方案4
0 2017-09-09 18:39:34

解决方案5
0 2020-12-12 07:57:08

解决方案6
-2 2017-09-06 13:45:21

为什么 Python threading.Condition() notify() 需要锁？

问题描述

6 个解决方案

解决方案1 7 已采纳 2017-09-13 09:40:21

Python's threading Module Python 的threading模块

解决方案2 3 2017-09-15 08:14:52

1. The notifier needs to take a lock 1.通知者需要拿锁

2. The condition variable itself needs a lock 2.条件变量本身需要锁

3. Multiple wake conditions can need the lock 3. 多个唤醒条件可能需要锁

4. The hardware might need a lock 4. 硬件可能需要锁

5. Real-time systems might need it 5. 实时系统可能需要它

解决方案3 0 2017-09-06 15:58:50

解决方案4 0 2017-09-09 18:39:34

解决方案5 0 2020-12-12 07:57:08

解决方案6 -2 2017-09-06 13:45:21

解决方案1
7 已采纳 2017-09-13 09:40:21

Python's `threading` Module Python 的`threading`模块

解决方案2
3 2017-09-15 08:14:52

解决方案3
0 2017-09-06 15:58:50

解决方案4
0 2017-09-09 18:39:34

解决方案5
0 2020-12-12 07:57:08

解决方案6
-2 2017-09-06 13:45:21