简体   繁体   English

有什么理由不使用 OrderedDict 吗?

[英]Are there any reasons not to use an OrderedDict?

I'm referring to the OrderedDict from the collections module, which is an ordered dictionary.我指的是collections模块中的OrderedDict ,它是一个有序字典。

If it has the added functionality of being orderable, which I realize may often not be necessary but even so, are there any downsides?如果它具有可订购的附加功能,我意识到这通常不是必需的,但即便如此,是否有任何缺点? Is it slower?是不是比较慢? Is it missing any functionality?它是否缺少任何功能? I didn't see any missing methods.我没有看到任何缺失的方法。

In short, why shouldn't I always use this instead of a normal dictionary?简而言之,为什么我不应该总是使用它而不是普通字典?

OrderedDict is a subclass of dict , and needs more memory to keep track of the order in which keys are added. OrderedDictdict的子类,需要更多内存来跟踪添加键的顺序。 This isn't trivial.这不是小事。 The implementation adds a second dict under the covers, and a doubly-linked list of all the keys (that's the part that remembers the order), and a bunch of weakref proxies.该实现在dict添加了第二个dict ,以及所有键的双向链接列表(这是记住顺序的部分),以及一堆弱引用代理。 It's not a lot slower, but at least doubles the memory over using a plain dict .它并没有慢很多,但至少比使用普通dict的内存翻倍。

But if it's appropriate, use it!但是,如果合适,请使用它! That's why it's there :-)这就是为什么它在那里:-)

How it works这个怎么运作

The base dict is just an ordinary dict mapping keys to values - it's not "ordered" at all.基本字典只是将键映射到值的普通字典 - 它根本不是“有序”的。 When a <key, value> pair is added, the key is appended to a list.添加<key, value>对时, key将附加到列表中。 The list is the part that remembers the order.列表是记住顺序的部分。

But if this were a Python list, deleting a key would take O(n) time twice over: O(n) time to find the key in the list, and O(n) time to remove the key from the list.但是,如果这是一个 Python 列表,删除一个键将花费O(n)倍的时间: O(n)时间在列表中找到键, O(n)时间从列表中删除键。

So it's a doubly-linked list instead.所以它是一个双向链表。 That makes deleting a key constant ( O(1) ) time.这使得删除键常量( O(1) )时间。 But we still need to find the doubly-linked list node belonging to the key.但是我们仍然需要找到属于该键的双向链表节点。 To make that operation O(1) time too, a second - hidden - dict maps keys to nodes in the doubly-linked list.为了使该操作也是O(1)时间,第二个隐藏的字典将键映射到双向链表中的节点。

So adding a new <key, value> pair requires adding the pair to the base dict, creating a new doubly-linked list node to hold the key, appending that new node to the doubly-linked list, and mapping the key to that new node in the hidden dict.因此,添加一个新的<key, value>对需要将该对添加到基本字典中,创建一个新的双向链表节点来保存键,将该新节点附加到双向链表中,并将键映射到新的隐藏字典中的节点。 A bit over twice as much work, but still O(1) (expected case) time overall.工作量是原来的两倍多一点,但总体上仍然是O(1) (预期情况)时间。

Similarly, deleting a key that's present is also a bit over twice as much work but O(1) expected time overall: use the hidden dict to find the key's doubly-linked list node, delete that node from the list, and remove the key from both dicts.类似地,删除存在的键也需要两倍多的工作,但总体预期时间为O(1) :使用隐藏的 dict 查找键的双向链表节点,从列表中删除该节点,然后删除键从两个字典。

Etc. It's quite efficient.等等。这是相当有效的。

multithreading多线程

if your dictionary is accessed from multiple threads without a lock, especially as a synchronisation point.如果您的字典在没有锁的情况下从多个线程访问,尤其是作为同步点。

vanilla dict operations are atomic, and any type extended in Python is not. vanilla dict 操作是原子的,在 Python 中扩展的任何类型都不是。

In fact, I'm not even certain OrderedDict is thread-safe (without a lock), although I cannot discount the possibility that it was very carefully coded and satisfies definition of reentrancy.事实上,我什至不确定 OrderedDict 是线程安全的(没有锁),尽管我不能否认它被非常仔细地编码并满足可重入性定义的可能性。

lesser devils小恶魔

memory usage if you create tons of these dictionaries如果您创建大量这些字典,则内存使用情况

cpu usage if all your code does is munge these dictionaries cpu 使用情况,如果您的所有代码都修改了这些字典

Since Python 3.7, all dictionaries are guaranteed to be ordered.从 Python 3.7 开始,所有字典都保证有序。 The Python contributors determined that switching to making dict ordered would not have a negative performance impact. Python 贡献者确定切换到使dict有序不会对性能产生负面影响。 I don't know how the performance of OrderedDict compares to dict in Python >= 3.7, but I imagine they would be comparable since they are both ordered.我不知道OrderedDict的性能与 Python >= 3.7 中的dict相比如何,但我想它们是可比的,因为它们都是有序的。

Note that there are still differences between the behaviour of OrderedDict and dict .请注意, OrderedDictdict的行为之间仍然存在差异。 See also: Will OrderedDict become redundant in Python 3.7 ?另请参阅: OrderedDict 在 Python 3.7 中会变得多余吗?

why shouldn't I always use this instead of a normal dictionary为什么我不应该总是使用它而不是普通字典

In Python 2.7, normal OrderedDict usage will create reference cycles .在 Python 2.7 中,正常的OrderedDict用法将创建引用循环 So any use of OrderedDict requires the garbage collector to be enabled in order to free the memory.因此,任何OrderedDict使用都需要启用垃圾收集器以释放内存。 Yes, the garbage collector is on by default in cPython, but disabling it has its uses .是的,垃圾收集器在 cPython 中默认是打开的,但是禁用它也有它的用途

eg With cPython 2.7.14例如使用 cPython 2.7.14

from __future__ import print_function

import collections
import gc

if __name__ == '__main__':
    d = collections.OrderedDict([('key', 'val')])
    gc.collect()
    del d
    gc.set_debug(gc.DEBUG_LEAK)
    gc.collect()
    for i, obj in enumerate(gc.garbage):
        print(i, obj)

outputs产出

gc: collectable <list 00000000033E7908>
gc: collectable <list 000000000331EC88>
0 [[[...], [...], 'key'], [[...], [...], 'key'], None]
1 [[[...], [...], None], [[...], [...], None], 'key']

Even if you just create an empty OrderedDict ( d = collections.OrderedDict() ) and don't add anything to it, or you explicitly try to clean it up by calling the clear method ( d.clear() before del d ), you will still get one self-referencing list:即使您只是创建了一个空的OrderedDict ( d = collections.OrderedDict() ) 并且不向其中添加任何内容,或者您​​明确地尝试通过调用clear方法( d.clear()del d之前)来清理它,你仍然会得到一个自引用列表:

gc: collectable <list 0000000003ABBA08>
0 [[...], [...], None]

This seems to have been the case since this commit removed the __del__ method in order to prevent the potential for OrderedDict to cause uncollectable cycles, which are arguably worse.情况似乎是这样,因为此提交删除了__del__方法,以防止OrderedDict导致无法收集的循环,这可以说是更糟。 As noted in the changelog for that commit:正如该提交的变更日志中所述:

Issue #9825 : removed __del__ from the definition of collections.OrderedDict.问题 #9825 :从 collections.OrderedDict 的定义中删除了 __del__。 This prevents user-created self-referencing ordered dictionaries from becoming permanently uncollectable GC garbage.这可以防止用户创建的自引用有序字典成为永久无法收集的 GC 垃圾。 The downside is that removing __del__ means that the internal doubly-linked list has to wait for GC collection rather than freeing memory immediately when the refcnt drops to zero.缺点是删除 __del__ 意味着内部双向链表必须等待 GC 收集,而不是在 refcnt 降为零时立即释放内存。


Note that in Python 3, the fix for the same issue was made differently and uses weakref proxies to avoid cycles:请注意,在 Python 3 中,对同一问题的修复方式有所不同,并使用弱引用代理来避免循环:

Issue #9825: Using __del__ in the definition of collections.OrderedDict made it possible for the user to create self-referencing ordered dictionaries which become permanently uncollectable GC garbage.问题 #9825:在 collections.OrderedDict 的定义中使用 __del__ 使用户可以创建自引用有序字典,这些字典成为永久不可收集的 GC 垃圾。 Reinstated the Py3.1 approach of using weakref proxies so that reference cycles never get created in the first place.恢复了使用弱引用代理的 Py3.1 方法,以便从一开始就不会创建引用循环。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM