[英]Python: why partition(sep) is faster than split(sep, maxsplit=1)
I found an interesting thing that partition
is faster than split
when get whole substring after the separator. 我发现一个有趣的事情,当分隔符后面的整个子字符串时,
partition
比split
更快。 I have tested in Python 3.5 and 3.6 (Cpython) 我在Python 3.5和3.6(Cpython)中测试过
In [1]: s = 'validate_field_name'
In [2]: s.partition('_')[-1]
Out[2]: 'field_name'
In [3]: s.split('_', maxsplit=1)[-1]
Out[3]: 'field_name'
In [4]: %timeit s.partition('_')[-1]
220 ns ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [5]: %timeit s.split('_', maxsplit=1)[-1]
745 ns ± 48.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [6]: %timeit s[s.find('_')+1:]
340 ns ± 1.44 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
I look through the Cpython source code and found the partition
use the FASTSEARCH
algorithm, see here . 我查看了Cpython源代码,发现该
partition
使用了FASTSEARCH
算法,请参见此处 。 And the split
only use FASTSEARCH
when the separator string's length is larger than 1, see here . 当分隔符字符串的长度大于1时,
split
仅使用FASTSEARCH
,请参见此处 。 But I have tested on sep string which length is larger. 但我已经测试了长度较大的sep字符串。 I got same result.
我得到了同样的结果。
I guess the reason is partition
return a three elements tuple, instead of a list. 我猜原因是
partition
返回一个三元素元组,而不是列表。
I want to know more details. 我想知道更多细节。
Microbenchmarks can be misleading 微量标记可能会产生误导
py -m timeit "'validate_field_name'.split('_', maxsplit=1)[-1]"
1000000 loops, best of 3: 0.568 usec per loop
py -m timeit "'validate_field_name'.split('_', 1)[-1]"
1000000 loops, best of 3: 0.317 usec per loop
Just passing the argument as positional or keyword changes the time significantly. 只是将参数作为位置或关键字传递会显着改变时间。 So I would guess another reason partition is faster, because it does not need a second argument...
所以我猜其他原因分区更快,因为它不需要第二个参数......
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.