简体   繁体   English

用另一个列表对元组列表进行排序

[英]Sort tuple list with another list

I have a tuple list to_order such as:我有一个元组列表to_order例如:

to_order = [(0, 1), (1, 3), (2, 2), (3,2)]

And a list which gives the order to apply to the second element of each tuple of to_order :还有一个列表,它给出了应用于to_order的每个元组的第二个元素的to_order

order = [2, 1, 3]

So I am looking for a way to get this output:所以我正在寻找一种方法来获得这个输出:

ordered_list = [(2, 2), (3,2), (0, 1), (1, 3)]

Any ideas?有任何想法吗?

You can provide a key that will check the index (of the second element) in order and sort based on it:您可以提供一个key来按order检查(第二个元素的)索引并根据它进行排序:

to_order = [(0, 1), (1, 3), (2, 2), (3,2)]
order = [2, 1, 3]
print(sorted(to_order, key=lambda item: order.index(item[1]))) # [(2, 2), (3, 2), (0, 1), (1, 3)]

EDIT编辑

Since, a discussion on time complexities was start... here ya go, the following algorithm runs in O(n+m) , using Eric's input example:因为,关于时间复杂性的讨论开始了......在这里,下面的算法在O(n+m) ,使用 Eric 的输入示例:

N = 5
to_order = [(randrange(N), randrange(N)) for _ in range(10*N)]
order = list(set(pair[1] for pair in to_order))
shuffle(order)


def eric_sort(to_order, order):
    bins = {}

    for pair in to_order:
        bins.setdefault(pair[1], []).append(pair)

    return [pair for i in order for pair in bins[i]]


def alfasin_new_sort(to_order, order):
    arr = [[] for i in range(len(order))]
    d = {k:v for v, k in enumerate(order)}
    for item in to_order:
        arr[d[item[1]]].append(item) 
    return [item for sublist in arr for item in sublist]


from timeit import timeit
print("eric_sort", timeit("eric_sort(to_order, order)", setup=setup, number=1000))
print("alfasin_new_sort", timeit("alfasin_new_sort(to_order, order)", setup=setup, number=1000))

OUTPUT:输出:

eric_sort 59.282021682999584
alfasin_new_sort 44.28244407700004

Algorithm算法

You can distribute the tuples in a dict of lists according to the second element and iterate over order indices to get the sorted list:您可以根据第二个元素将元组分布在列表字典中,并遍历order索引以获取排序列表:

from collections import defaultdict
to_order = [(0, 1), (1, 3), (2, 2), (3, 2)]
order = [2, 1, 3]

bins = defaultdict(list)

for pair in to_order:
    bins[pair[1]].append(pair)

print(bins)
# defaultdict(<class 'list'>, {1: [(0, 1)], 3: [(1, 3)], 2: [(2, 2), (3, 2)]})

print([pair for i in order for pair in bins[i]])
# [(2, 2), (3, 2), (0, 1), (1, 3)]

sort or index aren't needed and the output is stable.不需要sortindex ,输出稳定。

This algorithm is similar to the mapping mentioned in the supposed duplicate .该算法类似于假设的重复中提到的mapping This linked answer only works if to_order and order have the same lengths, which isn't the case in OP's question.此链接答案仅在to_orderorder具有相同长度时才有效,而在 OP 的问题中并非如此。

Performance表现

This algorithm iterates twice over each element of to_order .该算法对to_order每个元素迭代两次。 The complexity is O(n) .复杂度是O(n) @alfasin's first algorithm is much slower ( O(n * m * log n) ), but his second one is also O(n) . @alfasin 的第一个算法要慢得多( O(n * m * log n) ),但他的第二个算法也是O(n)

Here's a list with 10000 random pairs between 0 and 1000 .这是一个列表,其中包含01000之间的 10000 个随机对。 We extract the unique second elements and shuffle them in order to define order :我们提取唯一的第二个元素并将它们打乱以定义order

from random import randrange, shuffle
from collections import defaultdict
from timeit import timeit
from itertools import chain

N = 1000
to_order = [(randrange(N), randrange(N)) for _ in range(10*N)]
order = list(set(pair[1] for pair in to_order))
shuffle(order)


def eric(to_order, order):
    bins = defaultdict(list)
    for pair in to_order:
        bins[pair[1]].append(pair)
    return list(chain.from_iterable(bins[i] for i in order))


def alfasin1(to_order, order):
    arr = [[] for i in range(len(order))]
    d = {k:v for v, k in enumerate(order)}
    for item in to_order:
        arr[d[item[1]]].append(item) 
    return [item for sublist in arr for item in sublist]

def alfasin2(to_order, order):
    return sorted(to_order, key=lambda item: order.index(item[1]))

print(eric(to_order, order) == alfasin1(to_order, order))
# True
print(eric(to_order, order) == alfasin2(to_order, order))
# True

print("eric", timeit("eric(to_order, order)", globals=globals(), number=100))
# eric 0.3117517130003762
print("alfasin1", timeit("alfasin1(to_order, order)", globals=globals(), number=100))
# alfasin1 0.36100843100030033
print("alfasin2", timeit("alfasin2(to_order, order)", globals=globals(), number=100))
# alfasin2 15.031453827000405

Another solution: [item for key in order for item in filter(lambda x: x[1] == key, to_order)]另一种解决方案: [item for key in order for item in filter(lambda x: x[1] == key, to_order)]

This solution works off of order first, filtering to_order for each key in order .此解决方案的工作原理断order第一,过滤to_order为每个keyorder

Equivalent:相等的:

ordered = []
for key in order:
    for item in filter(lambda x: x[1] == key, to_order):
        ordered.append(item)

Shorter, but I'm not aware of a way to do this with list comprehension:更短,但我不知道有什么方法可以通过列表理解来做到这一点:

ordered = []
for key in order:
    ordered.extend(filter(lambda x: x[1] == key, to_order))

Note: This will not throw a ValueError if to_order contains a tuple x where x[1] is not in order .注意:如果to_order包含一个元组x其中x[1] is not in order这不会抛出ValueError

I personally prefer the list objects sort function rather than the built-in sort which generates a new list rather than changing the list in place.我个人比较喜欢list对象sort函数,而不是内置的sort ,产生一个新的列表,而不是在地方改变列表。

to_order = [(0, 1), (1, 3), (2, 2), (3,2)]
order = [2, 1, 3]
to_order.sort(key=lambda x: order.index(x[1]))
print(to_order)
>[(2, 2), (3, 2), (0, 1), (1, 3)]

A little explanation on the way: The key parameter of the sort method basically preprocesses the list and ranks all the values based on a measure.在路上一点解释:该key的排序方法的参数基本上preprocesses名单和ranks基于一个指标的所有值。 In our case order.index() looks at the first occurrence of the currently processed item and returns its position.在我们的例子中order.index()查看当前处理的项目的第一次出现并返回它的位置。

x = [1,2,3,4,5,3,3,5]
print x.index(5)
>4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM