列表上的Python sort（）方法vs builtin sorted（）函数

Question

I know that __builtin__ sorted() function works on any iterable. 我知道__builtin__ sorted（）函数适用于任何可迭代的。 But can someone explain this huge (10x) performance difference between anylist.sort() vs sorted(anylist) ? 但有人可以解释anylist.sort（）与sorted（anylist）之间的巨大（10x）性能差异吗？ Also, please point out if I am doing anything wrong with way this is measured. 另外，请指出我测量的方式是否有任何问题。

"""
Example Output:
$ python list_sort_timeit.py 
Using sort method: 20.0662879944
Using sorted builin method: 259.009809017
"""

import random
import timeit

print 'Using sort method:',
x = min(timeit.Timer("test_list1.sort()","import random;test_list1=random.sample(xrange(1000),1000)").repeat())
print x

print 'Using sorted builin method:',
x =  min(timeit.Timer("sorted(test_list2)","import random;test_list2=random.sample(xrange(1000),1000)").repeat())
print x

As the title says, I was interested in comparing list.sort() vs sorted(list). 正如标题所说，我有兴趣比较list.sort（）和sorted（list）。 The above snippet showed something interesting that, python's sort function behaves very well for already sorted data. 上面的代码片段展示了一些有趣的东西，python的sort函数对于已排序的数据表现得非常好。 As pointed out by Anurag, in the first case, the sort method is working on already sorted data and while in second sorted it is working on fresh piece to do work again and again. 正如Anurag所指出的那样，在第一种情况下，sort方法正在对已经排序的数据进行处理，而在第二种情况下，它正在处理新的工作，一次又一次地工作。

So I wrote this one to test and yes, they are very close. 所以我写了这个测试，是的，他们非常接近。

\n""" “””\nExample Output: 示例输出：\n$ python list_sort_timeit.py $ python list_sort_timeit.py \nUsing sort method: 19.0166599751 使用排序方法：19.0166599751\nUsing sorted builin method: 23.203567028 使用排序的建立方法：23.203567028\n""" “””\n\nimport random 随机导入\nimport timeit 导入时间\n\nprint 'Using sort method:', print'使用排序方法：'，\nx = min(timeit.Timer("test_list1.sort()","import random;test_list1=random.sample(xrange(1000),1000);test_list1.sort()").repeat()) x = min（timeit.Timer（“test_list1.sort（）”，“import random; test_list1 = random.sample（xrange（1000），1000）; test_list1.sort（）”）。repeat（））\nprint x 打印x\n\nprint 'Using sorted builin method:', print'使用排序的构建方法：'，\nx = min(timeit.Timer("sorted(test_list2)","import random;test_list2=random.sample(xrange(1000),1000);test_list2.sort()").repeat()) x = min（timeit.Timer（“sorted（test_list2）”，“import random; test_list2 = random.sample（xrange（1000），1000）; test_list2.sort（）”）。repeat（））\nprint x 打印x\n

Oh, I see Alex Martelli with a response, as I was typing this one.. ( I shall leave the edit, as it might be useful). 哦，我看到Alex Martelli的回复，因为我正在输入这个...（我将离开编辑，因为它可能有用）。

Answer 1

Your error in measurement is as follows: after your first call of test_list1.sort() , that list object IS sorted -- and Python's sort, aka timsort , is wickedly fast on already sorted lists!!! 您在测量误差如下：您的第一个电话后test_list1.sort()该列表中的对象进行排序-和Python的排序，又名timsort ，是不怀好意快上已排序列表！ That's the most frequent error in using timeit -- inadvertently getting side effects and not accounting for them. 这是使用timeit最常见的错误 - 无意中得到副作用而不考虑它们。

Here's a good set of measurements, using timeit from the command line as it's best used: 这是一组很好的测量，使用命令行中的timeit ，因为它最好用：

$ python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' '
y=list(x); y.sort()'
1000 loops, best of 3: 452 usec per loop
$ python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' '
x.sort()'
10000 loops, best of 3: 37.4 usec per loop
$ python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' '
sorted(x)'
1000 loops, best of 3: 462 usec per loop

As you see, y.sort() and sorted(x) are neck and neck, but x.sort() thanks to the side effects gains over an order of magnitude's advantage -- just because of your measurement error, though: this tells you nothing about sort vs sorted per se! 如你所见， y.sort()和sorted(x)是颈部和颈部，但x.sort()由于副作用增加超过一个数量级的优势 - 只是因为你的测量误差，但是：这告诉你对sort与sorted本身一无所知！ -) - ）

Answer 2

Because list.sort does in place sorting, so first time it sorts but next time you are sorting the sorted list. 因为list.sort进行了排序，所以第一次排序，但下次排序时会对排序列表进行排序。

eg try this and you will get same results in timeit case most of the time is spent is copying and sorted also does one more copy 例如尝试这个，你会在timeit情况下获得相同的结果，大部分时间花在复制和排序上也会再复制一次

import time
import random
test_list1=random.sample(xrange(1000),1000)
test_list2=random.sample(xrange(1000),1000)

s=time.time()
for i in range(100):
    test_list1.sort()
print time.time()-s

s=time.time()
for i in range(100):
    test_list2=sorted(test_list2)
print time.time()-s

Answer 3

Well, the .sort() method of lists sorts the list in place, while sorted() creates a new list. 好吧，列表的.sort()方法对列表进行排序，而sorted()创建一个新列表。 So if you have a large list, part of your performance difference will be due to copying. 因此，如果您有一个大型列表，部分性能差异将归因于复制。

Still, an order of magnitude difference seems larger than I'd expect. 不过，一个数量级的差异似乎比我预期的要大。 Perhaps list.sort() has some special-cased optimization that sorted() can't make use of. 也许list.sort()有一些特殊的优化， sorted()无法使用。 For example, since the list class already has an internal Py_Object*[] array of the right size, perhaps it can perform swaps more efficiently. 例如，由于list类已经具有正确大小的内部Py_Object*[]数组，因此它可以更有效地执行交换。

Edit : Alex and Anurag are right, the order of magnitude difference is due to you accidentally sorting an already-sorted list in your test case. 编辑：Alex和Anurag是对的，数量级差异是由于您不小心在测试用例中对已经排序的列表进行排序。 However, as Alex's benchmarks show, list.sort() is about 2% faster than sorted() , which would make sense due to the copying overhead. 但是，正如Alex的基准测试所示， list.sort()比sorted()快2％左右，由于复制开销，这将是有意义的。

列表上的Python sort（）方法vs builtin sorted（）函数

问题描述

3 个解决方案

解决方案1
51 已采纳

解决方案2
11

解决方案3
7

列表上的Python sort（）方法vs builtin sorted（）函数

问题描述

3 个解决方案

解决方案1 51 已采纳

解决方案2 11

解决方案3 7

解决方案1
51 已采纳

解决方案2
11

解决方案3
7