简体   繁体   English

Python:为什么partition(sep)比split更快(sep,maxsplit = 1)

[英]Python: why partition(sep) is faster than split(sep, maxsplit=1)

I found an interesting thing that partition is faster than split when get whole substring after the separator. 我发现一个有趣的事情,当分隔符后面的整个子字符串时, partitionsplit更快。 I have tested in Python 3.5 and 3.6 (Cpython) 我在Python 3.5和3.6(Cpython)中测试过

In [1]: s = 'validate_field_name'

In [2]: s.partition('_')[-1]
Out[2]: 'field_name'

In [3]: s.split('_', maxsplit=1)[-1]
Out[3]: 'field_name'

In [4]: %timeit s.partition('_')[-1]
220 ns ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit s.split('_', maxsplit=1)[-1]
745 ns ± 48.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [6]: %timeit s[s.find('_')+1:]
340 ns ± 1.44 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

I look through the Cpython source code and found the partition use the FASTSEARCH algorithm, see here . 我查看了Cpython源代码,发现该partition使用了FASTSEARCH算法,请参见此处 And the split only use FASTSEARCH when the separator string's length is larger than 1, see here . 当分隔符字符串的长度大于1时, split仅使用FASTSEARCH ,请参见此处 But I have tested on sep string which length is larger. 但我已经测试了长度较大的sep字符串。 I got same result. 我得到了同样的结果。

I guess the reason is partition return a three elements tuple, instead of a list. 我猜原因是partition返回一个三元素元组,而不是列表。

I want to know more details. 我想知道更多细节。

Microbenchmarks can be misleading 微量标记可能会产生误导

py -m timeit "'validate_field_name'.split('_', maxsplit=1)[-1]"
1000000 loops, best of 3: 0.568 usec per loop

py -m timeit "'validate_field_name'.split('_', 1)[-1]"
1000000 loops, best of 3: 0.317 usec per loop

Just passing the argument as positional or keyword changes the time significantly. 只是将参数作为位置或关键字传递会显着改变时间。 So I would guess another reason partition is faster, because it does not need a second argument... 所以我猜其他原因分区更快,因为它不需要第二个参数......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM