I have been reading up for a few hours trying to understand membership testing and speeds as I fell down that rabbit hole. I thought I had gotten it until I ran my own little timeit test
Here's the code
range_ = range(20, -1, -1)
w = timeit.timeit('0 in {seq}'.format(seq=list(range_)))
x = timeit.timeit('0 in {seq}'.format(seq=tuple(range_)))
y = timeit.timeit('0 in {seq}'.format(seq=set(range_)))
z = timeit.timeit('0 in {seq}'.format(seq=frozenset(range_)))
print('list:', w)
print('tuple:', x)
print('set:', y)
print('frozenset:', z)
and here is the result
list: 0.3762843
tuple: 0.38087859999999996
set: 0.06568490000000005
frozenset: 1.5114070000000002
List and tuple having the same time makes sense. I thought set and frozenset would have the same time as well but it is extremey slow even compared to lists?
Changing the code to the following gives me similar results still:
list_ = list(range(20, -1, -1))
tuple_ = tuple(range(20, -1, -1))
set_ = set(range(20, -1, -1))
frozenset_ = frozenset(range(20, -1, -1))
w = timeit.timeit('0 in {seq}'.format(seq=list_))
x = timeit.timeit('0 in {seq}'.format(seq=tuple_))
y = timeit.timeit('0 in {seq}'.format(seq=set_))
z = timeit.timeit('0 in {seq}'.format(seq=frozenset_))
It's not the membership test, it's the construction that's taking the time.
Consider the following:
import timeit
list_ = list(range(20, -1, -1))
tuple_ = tuple(range(20, -1, -1))
set_ = set(range(20, -1, -1))
frozenset_ = frozenset(range(20, -1, -1))
w = timeit.timeit('0 in list_', globals=globals())
x = timeit.timeit('0 in tuple_', globals=globals())
y = timeit.timeit('0 in set_', globals=globals())
z = timeit.timeit('0 in frozenset_', globals=globals())
print('list:', w)
print('tuple:', x)
print('set:', y)
print('frozenset:', z)
I get the following timings with Python 3.5:
list: 0.28041897085495293
tuple: 0.2775509520433843
set: 0.0552431708201766
frozenset: 0.05547476885840297
The following will demonstrate why frozenset
is so much slower by disassembling the code you're benchmarking:
import dis
def print_dis(code):
print('{code}:'.format(code=code))
dis.dis(code)
range_ = range(20, -1, -1)
print_dis('0 in {seq}'.format(seq=list(range_)))
print_dis('0 in {seq}'.format(seq=tuple(range_)))
print_dis('0 in {seq}'.format(seq=set(range_)))
print_dis('0 in {seq}'.format(seq=frozenset(range_)))
Its output is pretty self-explanatory:
0 in [20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]:
1 0 LOAD_CONST 0 (0)
3 LOAD_CONST 21 ((20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0))
6 COMPARE_OP 6 (in)
9 RETURN_VALUE
0 in (20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0):
1 0 LOAD_CONST 0 (0)
3 LOAD_CONST 21 ((20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0))
6 COMPARE_OP 6 (in)
9 RETURN_VALUE
0 in {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}:
1 0 LOAD_CONST 0 (0)
3 LOAD_CONST 21 (frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}))
6 COMPARE_OP 6 (in)
9 RETURN_VALUE
0 in frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}):
1 0 LOAD_CONST 0 (0)
3 LOAD_NAME 0 (frozenset)
6 LOAD_CONST 0 (0)
9 LOAD_CONST 1 (1)
12 LOAD_CONST 2 (2)
15 LOAD_CONST 3 (3)
18 LOAD_CONST 4 (4)
21 LOAD_CONST 5 (5)
24 LOAD_CONST 6 (6)
27 LOAD_CONST 7 (7)
30 LOAD_CONST 8 (8)
33 LOAD_CONST 9 (9)
36 LOAD_CONST 10 (10)
39 LOAD_CONST 11 (11)
42 LOAD_CONST 12 (12)
45 LOAD_CONST 13 (13)
48 LOAD_CONST 14 (14)
51 LOAD_CONST 15 (15)
54 LOAD_CONST 16 (16)
57 LOAD_CONST 17 (17)
60 LOAD_CONST 18 (18)
63 LOAD_CONST 19 (19)
66 LOAD_CONST 20 (20)
69 BUILD_SET 21
72 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
75 COMPARE_OP 6 (in)
78 RETURN_VALUE
This is because among the 4 data types you converted the range object into, frozenset
is the only data type in Python 3 that requires a name lookup in its literal form, and name lookups are expensive because it requires hashing the string of the name and then looking it up through local, global and then built-in namespaces:
>>> repr(list(range(3)))
'[0, 1, 2]'
>>> repr(tuple(range(3)))
'(0, 1, 2)'
>>> repr(set(range(3)))
'{0, 1, 2}'
>>> repr(frozenset(range(3)))
'frozenset([0, 1, 2])' # requires a name lookup when evaluated by timeit
In Python 2, sets also require a name lookup when converted by repr
, which is why @NPE reported in the comment that there is little difference in performance between a frozenset
and a set
in Python 2:
>>> repr(set(range(3)))
'set([0, 1, 2])'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.