简体   繁体   English

如何比较python 2和3中的字符串?

[英]How to compare strings in python 2 and 3?

I was playing around with some python and came up with this code: 我在玩一些python,并想出了以下代码:

import time
N = 10000000
t1 = time.time()
for _ in range(N):
    if 'lol' in ['lol']:
        pass
print(time.time() - t1)

t1 = time.time()
for _ in range(N):
    if 'lol' == 'lol':
        pass
print(time.time() - t1)

so, if I use python2 : 因此,如果我使用python2

(test) C:\Users\test>python test.py
0.530999898911
0.5

(test) C:\Users\test>python test.py
0.531000137329
0.5

(test) C:\Users\test>python test.py
0.528000116348
0.501000165939

And it is nice - I like that second variant is quicker and I should use 'lol' == 'lol' as it is more pythonic way to compare two strings. 很好-我喜欢第二个变体更快,我应该使用'lol' == 'lol'因为这是比较Python的比较两个字符串的方式。 But what happens if I use python3 : 但是,如果我使用python3会发生什么:

(test) C:\Users\test>python3 test.py
0.37500524520874023
0.3880295753479004

(test) C:\Users\test>python3 test.py
0.3690001964569092
0.3780345916748047

(test) C:\User\test>python3 test.py
0.37799692153930664
0.38797974586486816

using timeit: 使用timeit:

(test) C:\Users\test>python3 -m timeit "'lol' in ['lol']"
100000000 loops, best of 3: 0.0183 usec per loop

(test) C:\Users\test>python3 -m timeit "'lol' == 'lol'"
100000000 loops, best of 3: 0.019 usec per loop

O my god! 我的天哪! Why first variant is quicker? 为什么第一个变体更快? So should I use ugly style like 'lol' in ['lol'] when i use python3 ? 因此,当我使用python3时,应该'lol' in ['lol']使用类似'lol' in ['lol']丑陋样式吗?

The bulk of your python2 time is in constructing a huge list by calling range . python2的大部分时间是通过调用range构造一个巨大的列表。 Change it to xrange in Python 2, or use the timeit module which is properly written. 在Python 2中将其更改为xrange ,或使用正确编写的timeit模块。 Once you've done that, you will not find an appreciable difference that will motivate writing strange-looking code. 完成此操作后,您将不会发现明显的差异,这些差异会激发编写看起来很奇怪的代码。

So should I use ugly style like 'lol' in ['lol'] when i use python3? 因此,当我使用python3时,应该在['lol']中使用像'lol'这样的丑陋样式吗?

No, readability counts . 不, 可读性很重要

Also, as others have noted, your test case has weaknesses: 另外,正如其他人指出的那样,您的测试用例也有不足之处:

$ python3 -m timeit "'lol' == 'lol'"
>> 10000000 loops, best of 3: 0.024 usec per loop
$ python3 -m timeit "'lol' in ['lol']"
>> 10000000 loops, best of 3: 0.0214 usec per loop
$ python2 -m timeit "'lol' == 'lol'"
>> 10000000 loops, best of 3: 0.0258 usec per loop
$ python2 -m timeit "'lol' in ['lol']"
>> 10000000 loops, best of 3: 0.0212 usec per loop

There is no difference between python2 and python3 when it comes to which comparison is faster. python2和python3在比较上没有什么区别。


Another source of confusion might be due to the opaque behavior of python interpreters[1] when it comes to string caching/interning. 另一个引起混乱的原因可能是由于python解释器[1]在字符串缓存/内部处理方面的不透明行为。 As a rule of thumb, strings shorter than four characters are interned and will refer to the same object. 根据经验,会插入少于四个字符的字符串,并将它们指向同一对象。 It can be tested with something like 可以用类似的东西测试

a = 'lol'
b = 'lol'
a is b  # tests for object id instead of applying an equality comparison
>> True

Other strings may also be interned, but an easy counterexample is one of a string with 4 characters that includes special characters: 也可以插入其他字符串,但是一个简单的反例是包含4个字符的字符串之一,其中包括特殊字符:

a = '####'
b = '####'
a is b
>> False

Of course, testing for object ids is faster than making an actual comparison, and your test using in did just that. 当然,测试对象ID比进行实际比较要快,而使用in进行测试就可以做到。 Even though the code itself looks straight forward, the actual operation was unexpected. 即使代码本身看起来很直接,实际的操作还是出乎意料的。 That also means that slightly different scenarios may lead to surprising results and Funny Bugs. 这也意味着略有不同的方案可能会导致令人惊讶的结果和有趣的错误。

In conclusion I'd repeat once more: No, you should not prefer the second variant of comparison over the first. 最后,我再重复一遍:不,您不应该选择比较的第二个变体而不是第一个。

[1]: Only CPython . [1]: 仅CPython I do not know if other python interpreters do something similar. 我不知道其他python解释器是否也做类似的事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM