简体   繁体   中英

Python: Membership testing way slower with frozenset than sets, tuples and lists?

I have been reading up for a few hours trying to understand membership testing and speeds as I fell down that rabbit hole. I thought I had gotten it until I ran my own little timeit test

Here's the code

range_ = range(20, -1, -1)
w = timeit.timeit('0 in {seq}'.format(seq=list(range_)))
x = timeit.timeit('0 in {seq}'.format(seq=tuple(range_)))
y = timeit.timeit('0 in {seq}'.format(seq=set(range_)))
z = timeit.timeit('0 in {seq}'.format(seq=frozenset(range_)))
print('list:', w)
print('tuple:', x)
print('set:', y)
print('frozenset:', z)

and here is the result

list: 0.3762843

tuple: 0.38087859999999996

set: 0.06568490000000005

frozenset: 1.5114070000000002

List and tuple having the same time makes sense. I thought set and frozenset would have the same time as well but it is extremey slow even compared to lists?

Changing the code to the following gives me similar results still:

list_ = list(range(20, -1, -1))
tuple_ = tuple(range(20, -1, -1))
set_ = set(range(20, -1, -1))
frozenset_ = frozenset(range(20, -1, -1))

w = timeit.timeit('0 in {seq}'.format(seq=list_))
x = timeit.timeit('0 in {seq}'.format(seq=tuple_))
y = timeit.timeit('0 in {seq}'.format(seq=set_))
z = timeit.timeit('0 in {seq}'.format(seq=frozenset_))

It's not the membership test, it's the construction that's taking the time.

Consider the following:

import timeit

list_ = list(range(20, -1, -1))
tuple_ = tuple(range(20, -1, -1))
set_ = set(range(20, -1, -1))
frozenset_ = frozenset(range(20, -1, -1))

w = timeit.timeit('0 in list_', globals=globals())
x = timeit.timeit('0 in tuple_', globals=globals())
y = timeit.timeit('0 in set_', globals=globals())
z = timeit.timeit('0 in frozenset_', globals=globals())

print('list:', w)
print('tuple:', x)
print('set:', y)
print('frozenset:', z)

I get the following timings with Python 3.5:

list: 0.28041897085495293
tuple: 0.2775509520433843
set: 0.0552431708201766
frozenset: 0.05547476885840297

The following will demonstrate why frozenset is so much slower by disassembling the code you're benchmarking:

import dis

def print_dis(code):
  print('{code}:'.format(code=code))
  dis.dis(code)

range_ = range(20, -1, -1)
print_dis('0 in {seq}'.format(seq=list(range_)))
print_dis('0 in {seq}'.format(seq=tuple(range_)))
print_dis('0 in {seq}'.format(seq=set(range_)))
print_dis('0 in {seq}'.format(seq=frozenset(range_)))

Its output is pretty self-explanatory:

0 in [20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]:
  1           0 LOAD_CONST               0 (0)
              3 LOAD_CONST              21 ((20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0))
              6 COMPARE_OP               6 (in)
              9 RETURN_VALUE
0 in (20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0):
  1           0 LOAD_CONST               0 (0)
              3 LOAD_CONST              21 ((20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0))
              6 COMPARE_OP               6 (in)
              9 RETURN_VALUE
0 in {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}:
  1           0 LOAD_CONST               0 (0)
              3 LOAD_CONST              21 (frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}))
              6 COMPARE_OP               6 (in)
              9 RETURN_VALUE
0 in frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}):
  1           0 LOAD_CONST               0 (0)
              3 LOAD_NAME                0 (frozenset)
              6 LOAD_CONST               0 (0)
              9 LOAD_CONST               1 (1)
             12 LOAD_CONST               2 (2)
             15 LOAD_CONST               3 (3)
             18 LOAD_CONST               4 (4)
             21 LOAD_CONST               5 (5)
             24 LOAD_CONST               6 (6)
             27 LOAD_CONST               7 (7)
             30 LOAD_CONST               8 (8)
             33 LOAD_CONST               9 (9)
             36 LOAD_CONST              10 (10)
             39 LOAD_CONST              11 (11)
             42 LOAD_CONST              12 (12)
             45 LOAD_CONST              13 (13)
             48 LOAD_CONST              14 (14)
             51 LOAD_CONST              15 (15)
             54 LOAD_CONST              16 (16)
             57 LOAD_CONST              17 (17)
             60 LOAD_CONST              18 (18)
             63 LOAD_CONST              19 (19)
             66 LOAD_CONST              20 (20)
             69 BUILD_SET               21
             72 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             75 COMPARE_OP               6 (in)
             78 RETURN_VALUE

This is because among the 4 data types you converted the range object into, frozenset is the only data type in Python 3 that requires a name lookup in its literal form, and name lookups are expensive because it requires hashing the string of the name and then looking it up through local, global and then built-in namespaces:

>>> repr(list(range(3)))
'[0, 1, 2]'
>>> repr(tuple(range(3)))
'(0, 1, 2)'
>>> repr(set(range(3)))
'{0, 1, 2}'
>>> repr(frozenset(range(3)))
'frozenset([0, 1, 2])' # requires a name lookup when evaluated by timeit

In Python 2, sets also require a name lookup when converted by repr , which is why @NPE reported in the comment that there is little difference in performance between a frozenset and a set in Python 2:

>>> repr(set(range(3)))
'set([0, 1, 2])'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM