简体   繁体   中英

How to compare strings in python 2 and 3?

I was playing around with some python and came up with this code:

import time
N = 10000000
t1 = time.time()
for _ in range(N):
    if 'lol' in ['lol']:
        pass
print(time.time() - t1)

t1 = time.time()
for _ in range(N):
    if 'lol' == 'lol':
        pass
print(time.time() - t1)

so, if I use python2 :

(test) C:\Users\test>python test.py
0.530999898911
0.5

(test) C:\Users\test>python test.py
0.531000137329
0.5

(test) C:\Users\test>python test.py
0.528000116348
0.501000165939

And it is nice - I like that second variant is quicker and I should use 'lol' == 'lol' as it is more pythonic way to compare two strings. But what happens if I use python3 :

(test) C:\Users\test>python3 test.py
0.37500524520874023
0.3880295753479004

(test) C:\Users\test>python3 test.py
0.3690001964569092
0.3780345916748047

(test) C:\User\test>python3 test.py
0.37799692153930664
0.38797974586486816

using timeit:

(test) C:\Users\test>python3 -m timeit "'lol' in ['lol']"
100000000 loops, best of 3: 0.0183 usec per loop

(test) C:\Users\test>python3 -m timeit "'lol' == 'lol'"
100000000 loops, best of 3: 0.019 usec per loop

O my god! Why first variant is quicker? So should I use ugly style like 'lol' in ['lol'] when i use python3 ?

The bulk of your python2 time is in constructing a huge list by calling range . Change it to xrange in Python 2, or use the timeit module which is properly written. Once you've done that, you will not find an appreciable difference that will motivate writing strange-looking code.

So should I use ugly style like 'lol' in ['lol'] when i use python3?

No, readability counts .

Also, as others have noted, your test case has weaknesses:

$ python3 -m timeit "'lol' == 'lol'"
>> 10000000 loops, best of 3: 0.024 usec per loop
$ python3 -m timeit "'lol' in ['lol']"
>> 10000000 loops, best of 3: 0.0214 usec per loop
$ python2 -m timeit "'lol' == 'lol'"
>> 10000000 loops, best of 3: 0.0258 usec per loop
$ python2 -m timeit "'lol' in ['lol']"
>> 10000000 loops, best of 3: 0.0212 usec per loop

There is no difference between python2 and python3 when it comes to which comparison is faster.


Another source of confusion might be due to the opaque behavior of python interpreters[1] when it comes to string caching/interning. As a rule of thumb, strings shorter than four characters are interned and will refer to the same object. It can be tested with something like

a = 'lol'
b = 'lol'
a is b  # tests for object id instead of applying an equality comparison
>> True

Other strings may also be interned, but an easy counterexample is one of a string with 4 characters that includes special characters:

a = '####'
b = '####'
a is b
>> False

Of course, testing for object ids is faster than making an actual comparison, and your test using in did just that. Even though the code itself looks straight forward, the actual operation was unexpected. That also means that slightly different scenarios may lead to surprising results and Funny Bugs.

In conclusion I'd repeat once more: No, you should not prefer the second variant of comparison over the first.

[1]: Only CPython . I do not know if other python interpreters do something similar.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM