简体   繁体   English

Python:从有序列表中对键的子列表进行排序的最快方法是什么

[英]python: what is the fastest way to sort a sublist of key from an ordered list

I have a question about speed in sorting an unordered sublist of keys from a long list of keys. 我对从一长串键中排序无序的键子列表的速度有疑问。 So 所以

keys =['a','c','b','f','e','d','p','t','s','y','h']
sub_list = ['y','b','a','p']

I have two ideas: 我有两个想法:

sublist = sorted(sub_list, key=keys)

or, 要么,

sublist = [key for key in keys if key in sub_list]

There might be better ways than these two for all I know. 据我所知,可能有比这两种更好的方法。 Any thoughts? 有什么想法吗?

Just timeit: 只是时间:

In [3]: %timeit sorted(sub_list, lambda a,b: cmp(keys.index(a), keys.index(b)))
100000 loops, best of 3: 6.22 us per loop

In [4]: %timeit sublist = [key for key in keys if key in sub_list]
1000000 loops, best of 3: 1.91 us per loop

EDIT (more methods) : 编辑(更多方法)

%timeit sorted(sub_list, key=keys.index)
100000 loops, best of 3: 2.8 us per loop

This example uses the macros (or whatever they are called in ipython ) but you can use timeit yourself by: 本示例使用宏(或在ipython中调用的ipython ),但您可以通过以下方式自己使用timeit

import timeit

p = """
keys =['a','c','b','f','e','d','p','t','s','y','h']
sub_list = ['y','b','a','p']"""

s = "sorted(sub_list, lambda a,b: cmp(keys.index(a), keys.index(b)))"

timeit.Timer(stmt=s, setup=p).timeit()
>>> 8.40028386496742

s = "[key for key in keys if key in sub_list]"
timeit.Timer(stmt=s, setup=p).timeit()
>>> 1.9661344551401498

So you can just try all the methods you can think of and choose the fastest 因此,您可以尝试所有可以想到的方法并选择最快的方法

Why not just sub_list.sort() ? 为什么不只是sub_list.sort() It may not be the fastest, but it's certainly easy to understand. 它可能不是最快的,但是很容易理解。

I think you should use sub_list.sort over sorted because .sort makes an inplace sort where sorted makes a copy of the sublist prior to sorting 我认为您应该在sorted上使用sub_list.sort ,因为.sort进行就地排序,而sorted会在sorted之前复制子列表

the list comprehension you have made is very slow because the last if statement has to scan trough the entire sub_list (thus do n operations extra per key) 您进行列表理解的速度非常慢,因为最后一个if语句必须扫描整个sub_list(因此,每个键要进行n次操作)

sublist = [key for key in keys if key in sub_list]

a much faster list comprehension would be this 更快的列表理解将是这样

sub_set = set(sublist)
sub_list = [key for key in keys if key in sub_set]

because hash and set look ups are O(1) where list lookups are O(n) 因为哈希和集合查找为O(1),列表查找为O(n)

sorting is generally O(nlog(n)) and list comprehension are O(n) 排序通常为O(nlog(n)),列表理解为O(n)

however assuming that by: 但是通过以下假设:

sublist = sorted(sub_list, key=keys)

you mean: 你的意思是:

sublist = sorted(sub_list, key=keys.index)

you have list lookups instead of hash looks up and your sorting thus goes from O(nlog(n)) to O((n**2)*log(n)) 你有列表查找而不是哈希查找,因此排序从O(nlog(n))到O((n ** 2)* log(n))

to get the sorting back to nlog(n) you have to convert your key list to a hash as follows: 为了将排序返回到nlog(n),您必须将键列表转换为哈希,如下所示:

keys = dict(zip(keys, range(len(keys))))
sublist = sorted(sub_list, key=keys)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM