简体   繁体   English

检查 dict.items() 中的成员资格的时间复杂度是多少?

[英]What is the time complexity of checking membership in dict.items()?

What is the time complexity of checking membership in dict.items()?检查 dict.items() 中的成员资格的时间复杂度是多少?

According to the documentation :根据文档

Keys views are set-like since their entries are unique and hashable.键视图类似于集合,因为它们的条目是唯一且可散列的。 If all values are hashable, so that (key, value) pairs are unique and hashable, then the items view is also set-like.如果所有值都是可散列的,因此 (key, value) 对是唯一且可散列的,则项目视图也是类似集合的。 (Values views are not treated as set-like since the entries are generally not unique.) For set-like views, all of the operations defined for the abstract base class collections.abc.Set are available (for example, ==, <, or ^). (值视图不被视为类似集合,因为条目通常不是唯一的。)对于类似集合的视图,为抽象基础 class collections.abc.Set 定义的所有操作都可用(例如,==、< , 或 ^)。

So I did some testing with the following code:所以我用下面的代码做了一些测试:

from timeit import timeit

def membership(val, container):
    val in container

r = range(100000)
s = set(r)
d = dict.fromkeys(r, 1)
d2 = {k: [1] for k in r}
items_list = list(d2.items())

print('set'.ljust(12), end='')
print(timeit(lambda: membership(-1, s), number=1000))
print('dict'.ljust(12), end='')
print(timeit(lambda: membership(-1, d), number=1000))
print('d_keys'.ljust(12), end='')
print(timeit(lambda: membership(-1, d.keys()), number=1000))
print('d_values'.ljust(12), end='')
print(timeit(lambda: membership(-1, d.values()), number=1000))
print('\n*With hashable dict.values')
print('d_items'.ljust(12), end='')
print(timeit(lambda: membership((-1, 1), d.items()), number=1000))
print('*With unhashable dict.values')
print('d_items'.ljust(12), end='')
print(timeit(lambda: membership((-1, 1), d2.items()), number=1000))
print('d_items'.ljust(12), end='')
print(timeit(lambda: membership((-1, [1]), d2.items()), number=1000))
print('\nitems_list'.ljust(12), end='')
print(timeit(lambda: membership((-1, [1]), items_list), number=1000))

With the output:使用 output:

set         0.00034419999999998896
dict        0.0003307000000000171
d_keys      0.0004200000000000037
d_values    2.4773092

*With hashable dict.values
d_items     0.0004413000000003109
*With unhashable dict.values
d_items     0.00042879999999989593
d_items     0.0005549000000000248

items_list  3.5529328

As you can see, when the dict.values are all hashable ( int ),如您所见,当dict.values都是可散列的( int )时,
the execution time for the membership is similar to that of a set or d_keys ,成员资格的执行时间类似于setd_keys的执行时间,
because items view is set-like .因为items 视图是 set-like
The last two examples are on the dict.values with unhashable objects ( list ).最后两个示例在带有不可散列对象( list )的dict.values上。
So I assumed the execution time would be similar to that of a list .所以我假设执行时间类似于list的执行时间。
However, they are still similar to that of a set .但是,它们仍然类似于set

Does this mean that even though dict.values are unhashable objects,这是否意味着即使dict.values是不可散列的对象,
the implementation of items view is still very efficient, items view的实现还是很高效的,
resulting O(1) time complexity for checking the membership?导致检查成员资格的O(1)时间复杂度?

Am I missing something here?我在这里错过了什么吗?

EDITED per @chepner's comment: dict.fromkeys(r, [1]) -> {k: [1] for k in r}根据@chepner的评论编辑: dict.fromkeys(r, [1]) -> {k: [1] for k in r}
EDITED per @MarkRansom's comment: another test case list(d2.items())根据@MarkRansom的评论编辑:另一个测试用例list(d2.items())

Short-answer简答

The time complexity of membership testing in item views is O(1) .项目视图中成员资格测试的时间复杂度为O(1)

Psuedo-code for lookup查找的伪代码

This is how the membership testing works:这是成员资格测试的工作方式:

def dictitems_contains(dictview, key_value_pair):
    d = dictview.mapping
    k, v = key_value_pair
    try:
        return d[k] == v
    except KeyError:
        return False

Actual Code实际代码

Here's the C source code :这是C 源代码

static int
dictitems_contains(_PyDictViewObject *dv, PyObject *obj)
{
    int result;
    PyObject *key, *value, *found;
    if (dv->dv_dict == NULL)
        return 0;
    if (!PyTuple_Check(obj) || PyTuple_GET_SIZE(obj) != 2)
        return 0;
    key = PyTuple_GET_ITEM(obj, 0);
    value = PyTuple_GET_ITEM(obj, 1);
    found = PyDict_GetItemWithError((PyObject *)dv->dv_dict, key);
    if (found == NULL) {
        if (PyErr_Occurred())
            return -1;
        return 0;
    }
    Py_INCREF(found);
    result = PyObject_RichCompareBool(found, value, Py_EQ);
    Py_DECREF(found);
    return result;
}

Timing evidence for O(1) complexity O(1) 复杂度的时间证据

We get the same constant lookup time regardless of the dictionary size (in these cases: 100, 1,000, and 10,000).无论字典大小如何(在这些情况下:100、1,000 和 10,000),我们都会获得相同的恒定查找时间。

$ python3.8 -m timeit -s 'd = dict.fromkeys(range(100))'  '(99, None) in d.items()'
5000000 loops, best of 5: 92 nsec per loop

$ python3.8 -m timeit -s 'd = dict.fromkeys(range(1_000))'  '(99, None) in d.items()'
5000000 loops, best of 5: 92.2 nsec per loop

$ python3.8 -m timeit -s 'd = dict.fromkeys(range(10_000))'  '(99, None) in d.items()'
5000000 loops, best of 5: 92.1 nsec per loop

Evidence that lookup calls hash()查找调用 hash() 的证据

We can monitor hash calls by patching _ hash _() :我们可以通过修补_hash _()来监控 hash 调用:

class Int(int):
    def __hash__(self):
        print('Hash called')
        return hash(int(self))

Applying the monitoring tool show that hashing occurs when the dictionary is created and again when doing membership testing on the items view:应用监控工具显示,在创建字典时会发生散列,并且在项目视图上进行成员资格测试时会再次发生散列:

>>> d = {Int(1): 'one'}
Hash called
>>> (Int(1), 'one') in d.items()
Hash called
True

Lookup in an instance of dict_items is an O(1) operation (though one with an arbitrarily large constant, related to the complexity of comparing values.)dict_items实例中查找是一个 O(1) 操作(尽管它具有任意大的常数,与比较值的复杂性有关。)


dictitems_contains doesn't simply try to hash the tuple and look it up in a set-like collection of key/value pairs. dictitems_contains不会简单地尝试 hash 元组并在类似集合的键/值对集合中查找它。

(Note: all of the following links are just to different lines of dictitems_contain , if you don't want to click on them individually.) (注意:如果您不想单独单击它们,以下所有链接都只是指向dictitems_contain的不同行。)

To evaluate评估

(-1, [1]) in d2.items()

it first extracts the key from the tuple , then tries to find that key in the underlying dict .它首先从元组中提取键,然后尝试在底层dict中找到该键 If that lookup fails , it immediately returns false .如果查找失败,它会立即返回 false Only if the key is found does it then compare the value from the tuple to the value mapped to the key in the dict .只有找到键时,它才会将元组中的值与映射到 dict 中键的值进行比较

At no point does dictitems_contains need to hash the second element of the tuple. dictitems_contains在任何时候都不需要 hash 元组的第二个元素。

It's not clear in what ways an instance of dict_items is not set-like when the values are non-hashable, as mentioned in the documentation.如文档中所述,当值不可散列时,尚不清楚dict_items的实例在哪些方面不像集合。


A simplified, pure-Python implementation of dict_items.__contains__ might look something like dict_items.__contains__的简化的纯 Python 实现可能看起来像

class DictItems:
    def __init__(self, d):
        self.d = d

    def __contains__(self, t):
        key = t[0]
        value = t[1]
        try:
            dict_value = self.d[key]  # O(1) lookup
        except KeyError:
            return False
    
        return value == dict_value  # Arbitrarily expensive comparison

    ...

where d.items() returns DictItems(d) .其中d.items()返回DictItems(d)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM