[英]Python sort() method on list vs builtin sorted() function
I know that __builtin__
sorted() function works on any iterable. 我知道
__builtin__
sorted()函数适用于任何可迭代的。 But can someone explain this huge (10x) performance difference between anylist.sort() vs sorted(anylist) ? 但有人可以解释anylist.sort()与sorted(anylist)之间的巨大(10x)性能差异吗? Also, please point out if I am doing anything wrong with way this is measured.
另外,请指出我测量的方式是否有任何问题。
""" Example Output: $ python list_sort_timeit.py Using sort method: 20.0662879944 Using sorted builin method: 259.009809017 """ import random import timeit print 'Using sort method:', x = min(timeit.Timer("test_list1.sort()","import random;test_list1=random.sample(xrange(1000),1000)").repeat()) print x print 'Using sorted builin method:', x = min(timeit.Timer("sorted(test_list2)","import random;test_list2=random.sample(xrange(1000),1000)").repeat()) print x
So I wrote this one to test and yes, they are very close. 所以我写了这个测试,是的,他们非常接近。
\n"""“””\nExample Output:
示例输出:\n$ python list_sort_timeit.py
$ python list_sort_timeit.py \nUsing sort method: 19.0166599751
使用排序方法:19.0166599751\nUsing sorted builin method: 23.203567028
使用排序的建立方法:23.203567028\n"""
“””\n\nimport random
随机导入\nimport timeit
导入时间\n\nprint 'Using sort method:',
print'使用排序方法:',\nx = min(timeit.Timer("test_list1.sort()","import random;test_list1=random.sample(xrange(1000),1000);test_list1.sort()").repeat())
x = min(timeit.Timer(“test_list1.sort()”,“import random; test_list1 = random.sample(xrange(1000),1000); test_list1.sort()”)。repeat())\nprint x
打印x\n\nprint 'Using sorted builin method:',
print'使用排序的构建方法:',\nx = min(timeit.Timer("sorted(test_list2)","import random;test_list2=random.sample(xrange(1000),1000);test_list2.sort()").repeat())
x = min(timeit.Timer(“sorted(test_list2)”,“import random; test_list2 = random.sample(xrange(1000),1000); test_list2.sort()”)。repeat())\nprint x
打印x\n
Oh, I see Alex Martelli with a response, as I was typing this one.. ( I shall leave the edit, as it might be useful). 哦,我看到Alex Martelli的回复,因为我正在输入这个...(我将离开编辑,因为它可能有用)。
Your error in measurement is as follows: after your first call of test_list1.sort()
, that list object IS sorted -- and Python's sort, aka timsort , is wickedly fast on already sorted lists!!! 您在测量误差如下:您的第一个电话后
test_list1.sort()
该列表中的对象进行排序-和Python的排序,又名timsort ,是不怀好意快上已排序列表! That's the most frequent error in using timeit
-- inadvertently getting side effects and not accounting for them. 这是使用
timeit
最常见的错误 - 无意中得到副作用而不考虑它们。
Here's a good set of measurements, using timeit
from the command line as it's best used: 这是一组很好的测量,使用命令行中的
timeit
,因为它最好用:
$ python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' '
y=list(x); y.sort()'
1000 loops, best of 3: 452 usec per loop
$ python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' '
x.sort()'
10000 loops, best of 3: 37.4 usec per loop
$ python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' '
sorted(x)'
1000 loops, best of 3: 462 usec per loop
As you see, y.sort()
and sorted(x)
are neck and neck, but x.sort()
thanks to the side effects gains over an order of magnitude's advantage -- just because of your measurement error, though: this tells you nothing about sort
vs sorted
per se! 如你所见,
y.sort()
和sorted(x)
是颈部和颈部,但x.sort()
由于副作用增加超过一个数量级的优势 - 只是因为你的测量误差,但是:这告诉你对sort
与sorted
本身一无所知! -) - )
Because list.sort does in place sorting, so first time it sorts but next time you are sorting the sorted list. 因为list.sort进行了排序,所以第一次排序,但下次排序时会对排序列表进行排序。
eg try this and you will get same results in timeit case most of the time is spent is copying and sorted also does one more copy 例如尝试这个,你会在timeit情况下获得相同的结果,大部分时间花在复制和排序上也会再复制一次
import time
import random
test_list1=random.sample(xrange(1000),1000)
test_list2=random.sample(xrange(1000),1000)
s=time.time()
for i in range(100):
test_list1.sort()
print time.time()-s
s=time.time()
for i in range(100):
test_list2=sorted(test_list2)
print time.time()-s
Well, the .sort()
method of lists sorts the list in place, while sorted()
creates a new list. 好吧,列表的
.sort()
方法对列表进行排序,而sorted()
创建一个新列表。 So if you have a large list, part of your performance difference will be due to copying. 因此,如果您有一个大型列表,部分性能差异将归因于复制。
Still, an order of magnitude difference seems larger than I'd expect. 不过,一个数量级的差异似乎比我预期的要大。 Perhaps
list.sort()
has some special-cased optimization that sorted()
can't make use of. 也许
list.sort()
有一些特殊的优化, sorted()
无法使用。 For example, since the list
class already has an internal Py_Object*[]
array of the right size, perhaps it can perform swaps more efficiently. 例如,由于
list
类已经具有正确大小的内部Py_Object*[]
数组,因此它可以更有效地执行交换。
Edit : Alex and Anurag are right, the order of magnitude difference is due to you accidentally sorting an already-sorted list in your test case. 编辑 :Alex和Anurag是对的,数量级差异是由于您不小心在测试用例中对已经排序的列表进行排序。 However, as Alex's benchmarks show,
list.sort()
is about 2% faster than sorted()
, which would make sense due to the copying overhead. 但是,正如Alex的基准测试所示,
list.sort()
比sorted()
快2%左右,由于复制开销,这将是有意义的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.