排序多个列表的最快方法 - Python

Question

I have two lists, x and y, and I want to sort x and permute y by the permutation of x-sorting. 我有两个列表，x和y，我想通过x排序的排列来排序x和置换y。 For example, given 例如，给定

x = [4, 2, 1, 3]
y = [40, 200, 1, 30]

I want to get 我想得到

x_sorted = [1,2,3,4]
y_sorted = [1, 200, 30, 40]

As discussed in past questions, a simple way to solve this is 正如过去的问题所讨论的，解决这个问题的一个简单方法是

x_sorted, y_sorted = zip(*sorted(zip(x,y)))

Here is my question: What is the FASTEST way to do this? 这是我的问题：最快的方法是什么？

I have three methods to do the task. 我有三种方法来完成任务。

import numpy as np
x = np.random.random(1000)
y = np.random.random(1000)

method 1: 方法1：

x_sorted, y_sorted = zip(*sorted(zip(x,y))) #1.08 ms

method 2: 方法2：

foo = zip(x,y)
foo.sort()
zip(*foo)       #1.05 ms

method 3; 方法3;

ind = range(1000)
ind.sort(key=lambda i:x[i])
x_sorted = [x[i] for i in ind]
y_sorted = [y[i] for i in ind]  #934us

Is there a better method, which executes faster than above three methods? 有没有比上述三种方法更快的执行方法？

Additional questions. 其他问题。

Why method 2 is not faster than method 1 though it uses sort method? 为什么方法2不比方法1快，尽管它使用排序方法？
If I execute method 2 separately, it is faster. 如果我单独执行方法2，它会更快。 In IPython terminal, 在IPython终端中，

I have 我有

%timeit foo = zip(x,y)   #1000 loops, best of 3: 220 us per loop
%timeit foo.sort()       #10000 loops, best of 3: 78.9 us per loop
%timeit zip(*foo)        #10000 loops, best of 3: 73.8 us per loop

Answer 1

Using numpy.argsort : 使用numpy.argsort ：

>>> import numpy as np
>>> x = np.array([4,2,1,3])
>>> y = np.array([40,200,1,30])
>>> order = np.argsort(x)
>>> x_sorted = x[order]
>>> y_sorted = y[order]
>>> x_sorted
array([1, 2, 3, 4])
>>> y_sorted
array([  1, 200,  30,  40])

>>> timeit('order = np.argsort(x); x_sorted = x[order]; y_sorted = y[order]', 'from __main__ import x, y, np', number=1000)
0.030632019043

NOTE 注意

This makes sense if input data are already numpy arrays. 如果输入数据已经是numpy数组，这是有意义的。

Answer 2

>>> x = [4, 2, 1, 3]
>>> y = [40, 200, 1, 30]    
>>> x_sorted, y_sorted = zip(*sorted(zip(x, y), key=lambda a:a[0]))
>>> x_sorted
(1, 2, 3, 4)
>>> y_sorted
(1, 200, 30, 40)

Performance: 性能：

>>> timeit('foo = zip(x,y); foo.sort(); zip(*foo)', 'from __main__ import x, y', number=1000)
1.0197240443760691
>>> timeit('zip(*sorted(zip(x,y)))', 'from __main__ import x, y', number=1000)
1.0106219310922597
>>> timeit('ind = range(1000); ind.sort(key=lambda i:x[i]); x_sorted = [x[i] for i in ind]; y_sorteds = [y[i] for i in ind]', 'from __main__ import x, y', number=1000)
0.9043525504607857
>>> timeit('zip(*sorted(zip(x, y), key=lambda a:a[0]))', 'from __main__ import x, y', number=1000)
0.8288150863453723

To see the full picture: 要查看完整图片：

>>> timeit('sorted(x)', 'from __main__ import x, y', number=1000)
0.40415491505723367            # just getting sorted list from x
>>> timeit('x.sort()', 'from __main__ import x, y', number=1000)
0.008009909448446706           # sort x inplace

@falsetru method - fastest for np.arrays @falsetru方法 - 对于np.arrays来说速度最快

>>> timeit('order = np.argsort(x); x_sorted = x[order]; y_sorted = y[order]', 'from __main__ import x, y, np', number=1000)
0.05441799872323827

As @AshwiniChaudhary suggested in comments, for lists there's a way to speed it up by using itertools.izip instead of zip : 正如@AshwiniChaudhary在评论中所建议的那样， 对于列表，有一种方法可以通过使用itertools.izip而不是zip来加速它：

>>> timeit('zip(*sorted(izip(x, y), key=itemgetter(0)))', 'from __main__ import x, y;from operator import itemgetter;from itertools import izip', number=1000)
0.4265049757161705

Answer 3

You're not timing this properly 你没有正确计时

%timeit foo.sort()

After the 1st loop, it's already sorted for the remainder. 在第一个循环之后，它已经为剩余部分排序了。 Timsort is very efficient for presorted lists. Timsort对预先排序的列表非常有效。

I was a little surprised that @Roman's use of a key function was so much faster. 我有点惊讶@ Roman使用关键功能的速度要快得多。 You can improve on this further by using itemgetter 您可以使用itemgetter进一步改进

from operator import itemgetter
ig0 = itemgetter(0)
zip(*sorted(zip(x, y), key=ig0))

This is about 9% faster than using a lambda function for lists of 1000 elements 这比使用lambda函数对1000个元素的列表快9％

排序多个列表的最快方法 - Python

问题描述

3 个解决方案

解决方案1
7 2013-08-21 04:58:03

解决方案2
4 2013-08-21 04:49:40

解决方案3
4 2013-08-21 05:02:17

排序多个列表的最快方法 - Python

问题描述

3 个解决方案

解决方案1 7 2013-08-21 04:58:03

解决方案2 4 2013-08-21 04:49:40

解决方案3 4 2013-08-21 05:02:17

解决方案1
7 2013-08-21 04:58:03

解决方案2
4 2013-08-21 04:49:40

解决方案3
4 2013-08-21 05:02:17