简体   繁体   English

list(dict.items()) 是线程安全的吗?

[英]Is list(dict.items()) thread-safe?

Is the usage of list(d.items()) in the example below safe?下面示例中使用list(d.items())是否安全?

import threading

n = 2000

d = {}

def dict_to_list():
    while True:
        list(d.items())  # is this safe to do?

def modify():
    for i in range(n):
        d[i] = i

if __name__ == "__main__":
    t1 = threading.Thread(target=dict_to_list, daemon=True)
    t1.start()

    t2 = threading.Thread(target=modify, daemon=True)
    t2.start()
    t2.join()

The background behind this question is that an iterator over a dictionary item view checks on every step whether the dictionary size changed, as the following example illustrates.这个问题背后的背景是,字典项视图上的迭代器在每一步检查字典大小是否发生变化,如下例所示。

d = {}
view = d.items()  # this is an iterable
it = iter(view)  # this is an iterator
d[1] = 1
print(list(view))  # this is ok, it prints [(1, 1)]
print(list(it))  # this raises a RuntimeError because the size of the dictionary changed

So if the call to list(...) in the first example above can be interrupted (ie, the thread t1 could release the GIL), the first example might cause RuntimeErrors to occur in thread t1 .因此,如果上面第一个示例中对list(...)的调用可以被中断(即线程t1可以释放 GIL),那么第一个示例可能会导致线程t1中发生 RuntimeErrors。 There are sources that claim the operation is not atomic, see here .有消息称该操作不是原子操作,请参见此处 However, I haven't been able to get the first example to crash.但是,我无法让第一个示例崩溃。

I understand that the safe thing to do here would be to use some locks instead of trying to rely on the atomicity of certain operations.我知道在这里做的安全的事情是使用一些锁而不是试图依赖某些操作的原子性。 However, I'm debugging an issue in a third party library that uses similar code and that I cannot necessarily change directly.但是,我正在使用类似代码的第三方库中调试一个问题,并且我不一定要直接更改。

Short answer: it might be fine but use a lock anyway.简短的回答:可能没问题,但无论如何都要使用锁。

Using dis you can see that list(d.items()) is effectively two bytecode instructions ( 6 and 8 ):使用dis你可以看到list(d.items())实际上是两个字节码指令( 68 ):

>>> import dis
>>> dis.dis("list(d.items())")
  1           0 LOAD_NAME                0 (list)
              2 LOAD_NAME                1 (d)
              4 LOAD_METHOD              2 (items)
              6 CALL_METHOD              0
              8 CALL_FUNCTION            1
             10 RETURN_VALUE

On the Python FAQ it says that (generally) things implemented in C are atomic (from the point of view of a running Python program):在 Python FAQ 上,它说(通常)在 C 中实现的东西是原子的(从正在运行的 Python 程序的角度来看):

What kinds of global value mutation are thread-safe? 什么样的全局值突变是线程安全的?

In general, Python offers to switch among threads only between bytecode instructions;一般来说,Python 只提供在字节码指令之间切换线程; [...]. [...]。 Each bytecode instruction and therefore all the C implementation code reached from each instruction is therefore atomic from the point of view of a Python program.因此,从 Python 程序的角度来看,每条字节码指令以及从每条指令到达的所有 C 实现代码都是原子的。

[...] [...]

For example, the following operations are all atomic [...]例如,以下操作都是原子的 [...]

 D.keys()

list() is implemented in C and d.items() is implemented in C so each should be atomic, unless they end up somehow calling out to Python code (which can happen if they call out to a dunder method that you overrode using a Python implementation) or if you're using a subclass of dict and not a real dict or if their C implementation releases the GIL . list() is implemented in C and d.items() is implemented in C so each should be atomic, unless they end up somehow calling out to Python code (which can happen if they call out to a dunder method that you overrode using a Python 实现)或者如果您使用的是dict的子类而不是真正的dict或者如果他们的 C 实现发布了 GIL It's not a good idea to rely on them being atomic.依赖它们是原子的 并不是一个好主意

You mention that iter() will error if its underlying iterable changes size, but that's not relevant here because .keys() , .values() and .items() return a view object and those have no problem with the underlying object changing:您提到iter()如果其基础可迭代更改大小将出错,但这与此处无关,因为.keys().values().items()返回视图 object并且这些对基础 object 更改没有问题:

d = {"a": 1, "b": 2}
view = d.items()
print(list(view))  # [("a", 1), ("b", 2)]
d["c"] = 3         # this could happen in a different thread
print(list(view))  # [("a", 1), ("b", 2), ("c", 3)]

If you're modifying the dict in more than one instruction at a time, you'll sometimes get d in an inconsistent state where some of the modifications have been made and some haven't yet, but you shouldn't get a RuntimeError like you do with iter() , unless you modify it in a way that's non-atomic.如果您一次在多个指令中修改 dict ,有时您会在不一致的 state 中得到d ,其中一些修改已经完成,有些还没有,但你不应该得到像RuntimeError这样的你用iter()做,除非你以非原子的方式修改它。

I suspect the author of that article was confused about dict views, thinking dict.items returns an iter ator like dict.iteritems did in Python 2, not an iter able like it does in Python 3. Note that that article was written almost 13 years ago, five months before Python 3.0 was released .我怀疑那篇文章的作者对 dict 视图感到困惑,认为dict.items返回一个迭代,就像dict.iteritems在 Python 2 中所做的那样,而不是像Python 3 中那样的迭代器。请注意,这篇文章写了将近 13 年之前, Python 3.0 发布前五个月。 Btw, as PEP 3106 says (emphasis mine):顺便说一句,正如PEP 3106所说(强调我的):

The original plan was to simply let.keys(), .values() and.items() return an iter ator , ie exactly what iterkeys(), itervalues() and iteritems() return in Python 2.x.最初的计划是简单地 let.keys()、.values() 和 .items() 返回一个迭代,即在 Python 2.x 中返回的正是 iterkeys()、itervalues() 和 iteritems()。

Python 2, iteritems gives an iter ator : Python 2, iteritems给出了一个迭代

>>> d = {1: 1, 2: 2, 3: 3}
>>> items = d.iteritems()
>>> items
<dictionary-itemiterator object at 0x0000000003EBA958>
>>> next(items)
(1, 1)
>>> list(items)
[(2, 2), (3, 3)]
>>> list(items)
[]

Python 3, items gives an iter able , not an iter ator : Python 3, items给出了一个迭代器,而不是一个迭代

>>> d = {1: 1, 2: 2, 3: 3}
>>> items = d.items()
>>> items
dict_items([(1, 1), (2, 2), (3, 3)])
>>> next(items)
Traceback (most recent call last):
  File "<pyshell#9>", line 1, in <module>
    next(items)
TypeError: 'dict_items' object is not an iterator
>>> list(items)
[(1, 1), (2, 2), (3, 3)]
>>> list(items)
[(1, 1), (2, 2), (3, 3)]

And in Python 2, with the iter ator , this does cause the error:Python 2 中,使用迭代器,这确实会导致错误:

>>> d = {1: 1, 2: 2, 3: 3}
>>> items = d.iteritems()
>>> d[4] = 4
>>> next(items)

Traceback (most recent call last):
  File "<pyshell#26>", line 1, in <module>
    next(items)
RuntimeError: dictionary changed size during iteration

In Python 3, if d.items() did return an iterator, ie, if it were equivalent to iter(d.items()) , then it would be unsafe.在 Python 3 中,如果d.items()确实返回了一个迭代器,即如果它等效于iter(d.items()) ,那么它将是不安全的。 Because your thread might get interrupted between the iterator creation by iter() and the consumption by list() .因为您的线程可能会在iter()创建迭代器和list()消耗之间中断。 But since it returns an iter able , it's the list() function itself that internally creates an iter ator from the iterable, so both the iterator creation and its consumption happen during the same single bytecode instruction (executing the list() function).但是由于它返回一个 iterable ,它是list() function本身在内部从 iterable 创建一个迭代,所以迭代器的创建和它的消耗都发生在一个字节码指令(执行list()函数)。

If you change your code to list(iter(d.items())) and increase n to let's say 20000000 , then you'll likely get the error.如果您将代码更改为list(iter(d.items()))并将n增加到假设20000000 ,那么您可能会收到错误消息。 Example from a run on Try it online!运行示例在线尝试! :

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File ".code.tio", line 9, in dict_to_list
    list(iter(d.items()))  # is this safe to do?
RuntimeError: dictionary changed size during iteration

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM