Python：从有序列表中对键的子列表进行排序的最快方法是什么

Question

I have a question about speed in sorting an unordered sublist of keys from a long list of keys. 我对从一长串键中排序无序的键子列表的速度有疑问。 So 所以

keys =['a','c','b','f','e','d','p','t','s','y','h']
sub_list = ['y','b','a','p']

I have two ideas: 我有两个想法：

sublist = sorted(sub_list, key=keys)

or, 要么，

sublist = [key for key in keys if key in sub_list]

There might be better ways than these two for all I know. 据我所知，可能有比这两种更好的方法。 Any thoughts? 有什么想法吗？

Answer 1

Just timeit: 只是时间：

In [3]: %timeit sorted(sub_list, lambda a,b: cmp(keys.index(a), keys.index(b)))
100000 loops, best of 3: 6.22 us per loop

In [4]: %timeit sublist = [key for key in keys if key in sub_list]
1000000 loops, best of 3: 1.91 us per loop

EDIT (more methods) : 编辑（更多方法） ：

%timeit sorted(sub_list, key=keys.index)
100000 loops, best of 3: 2.8 us per loop

This example uses the macros (or whatever they are called in ipython ) but you can use timeit yourself by: 本示例使用宏（或在ipython中调用的ipython ），但您可以通过以下方式自己使用timeit ：

import timeit

p = """
keys =['a','c','b','f','e','d','p','t','s','y','h']
sub_list = ['y','b','a','p']"""

s = "sorted(sub_list, lambda a,b: cmp(keys.index(a), keys.index(b)))"

timeit.Timer(stmt=s, setup=p).timeit()
>>> 8.40028386496742

s = "[key for key in keys if key in sub_list]"
timeit.Timer(stmt=s, setup=p).timeit()
>>> 1.9661344551401498

So you can just try all the methods you can think of and choose the fastest 因此，您可以尝试所有可以想到的方法并选择最快的方法

Answer 2

Why not just sub_list.sort() ? 为什么不只是sub_list.sort() ？ It may not be the fastest, but it's certainly easy to understand. 它可能不是最快的，但是很容易理解。

Answer 3

I think you should use sub_list.sort over sorted because .sort makes an inplace sort where sorted makes a copy of the sublist prior to sorting 我认为您应该在sorted上使用sub_list.sort ，因为.sort进行就地排序，而sorted会在sorted之前复制子列表

the list comprehension you have made is very slow because the last if statement has to scan trough the entire sub_list (thus do n operations extra per key) 您进行列表理解的速度非常慢，因为最后一个if语句必须扫描整个sub_list（因此，每个键要进行n次操作）

sublist = [key for key in keys if key in sub_list]

a much faster list comprehension would be this 更快的列表理解将是这样

sub_set = set(sublist)
sub_list = [key for key in keys if key in sub_set]

because hash and set look ups are O(1) where list lookups are O(n) 因为哈希和集合查找为O（1），列表查找为O（n）

sorting is generally O(nlog(n)) and list comprehension are O(n) 排序通常为O（nlog（n）），列表理解为O（n）

however assuming that by: 但是通过以下假设：

sublist = sorted(sub_list, key=keys)

you mean: 你的意思是：

sublist = sorted(sub_list, key=keys.index)

you have list lookups instead of hash looks up and your sorting thus goes from O(nlog(n)) to O((n**2)*log(n)) 你有列表查找而不是哈希查找，因此排序从O（nlog（n））到O（（n ** 2）* log（n））

to get the sorting back to nlog(n) you have to convert your key list to a hash as follows: 为了将排序返回到nlog（n），您必须将键列表转换为哈希，如下所示：

keys = dict(zip(keys, range(len(keys))))
sublist = sorted(sub_list, key=keys)

Python：从有序列表中对键的子列表进行排序的最快方法是什么

问题描述

3 个解决方案

解决方案1
1 2012-12-05 20:51:42

解决方案2
0 2012-12-05 21:03:52

解决方案3
0 已采纳 2012-12-05 22:14:59

Python：从有序列表中对键的子列表进行排序的最快方法是什么

问题描述

3 个解决方案

解决方案1 1 2012-12-05 20:51:42

解决方案2 0 2012-12-05 21:03:52

解决方案3 0 已采纳 2012-12-05 22:14:59

解决方案1
1 2012-12-05 20:51:42

解决方案2
0 2012-12-05 21:03:52

解决方案3
0 已采纳 2012-12-05 22:14:59