简体   繁体   English

Python - 在列表列表中对元素进行排序

[英]Python - Sorting elements in a list of lists

Apologies if this has been answered elsewhere; 如果在别处得到回答,请道歉; I've tried searching, but haven't found anything that answers my question (or perhaps I have, but didn't understand it)... 我试过搜索,但没有找到任何能回答我问题的东西(或许我有,但不明白)......

I'm fairly new to Python (v2.6.2) and have a list of lists containing floating point values which looks something like the following (except the full thing has 2+ million entries for each list): 我是Python的新手(v2.6.2)并且有一个包含浮点值的列表列表,看起来类似于以下内容(除了完整的东西每个列表有超过200万个条目):

cat = [[152.123, 150.456, 151.789, ...], [4.123, 3.456, 1.789, ...], [20.123, 22.456, 21.789, ...]]

Now what I would like to do is sort all 3 of the lists by ascending order of the elements of the 3rd list, such that I get: 现在我想要做的是按照第三个列表的元素的升序对所有3个列表进行排序,这样我得到:

cat_sorted = [[152.123, 151.789, 150.456, ...], [4.123, 1.789, 3.456, ...], [20.123, 21.789, 22.456, ...]]

I've tried a few things, but they don't give me what I'm looking for (or perhaps I'm using them incorrectly). 我尝试了一些东西,但他们没有给我我正在寻找的东西(或者我可能错误地使用它们)。 Is there a way to do what I am looking for and if so, what's the easiest & quickest (considering I have 3 x 2million entries)? 有没有办法做我想要的东西,如果有的话,最简单和最快的是什么(考虑到我有3×2百万条款)? Is there a way of sorting one list using another? 有没有办法用另一个列表排序一个列表?

This is going to be painful, but using default python you have 2 options: 这将是痛苦的,但使用默认的python你有2个选择:

  • decorate the 1st and 2nd lists with enumerate() , then sort these using the index to refer to values from the 3rd list: 使用enumerate()装饰第一个和第二个列表,然后使用索引对这些列表进行排序以引用第三个列表中的值:

     cat_sorted = [ [e for i, e in sorted(enumerate(cat[0]), key=lambda p: cat[2][p[0]])], [e for i, e in sorted(enumerate(cat[1]), key=lambda p: cat[2][p[0]])], sorted(cat[2]) ] 

    although it may help to sort cat[2] in-place instead of using sorted() ; 虽然它可能有助于对cat[2]进行原位sorted()而不是使用sorted() ; you cannot get around using sorted() for the other two. 你不能使用sorted()来解决其他两个问题。

  • zip() the three lists together, then sort on the third element of this new list of lists, then zip() again to get back to the original structure: zip()将三个列表放在一起,然后对这个新列表列表的第三个元素进行排序,然后再次使用zip()返回到原始结构:

     from operator import itemgetter cat_sorted = zip(*sorted(zip(*cat), key=itemgetter(2))) 

Neither will be a performance buster, not with plain python lists of millions of numbers. 两者都不是性能破坏者,也不是数百万个数字的普通python列表。

If you're willing to use an additional library, I suggest Python Pandas . 如果您愿意使用额外的库,我建议使用Python Pandas It has a DataFrame object similar to R's data.frame and accepts a list of lists in the constructor, which will create a 3-column data array. 它有一个类似于R的data.frame的DataFrame对象,并接受构造函数中的列表列表,这将创建一个3列数据数组。 Then you can easily use the built-in pandas.DataFrame.sort function to sort by the third column (ascending or descending). 然后,您可以轻松使用内置的pandas.DataFrame.sort函数按第三列(升序或降序)进行排序。

There are many plain Python ways to do this, but given the size of your problem, using the optimized functions in Pandas is a better approach. 有许多简单的Python方法可以做到这一点,但考虑到问题的大小,使用Pandas中的优化函数是一种更好的方法。 And if you need any kind of aggregated statistics from your sorted data, then Pandas is a no-brainer for this. 如果您需要从排序数据中获得任何类型的汇总统计数据,那么Pandas就是一个明智的选择。

The general approach I would take was to do a schwartzian transform on the whole thing. 我将采取的一般方法是对整个事情进行schwartzian变换

Zip the three lists together into a list of tuples. 将三个列表一起压缩成元组列表。

Sort the tuples using the third element as key. 使用第三个元素作为键对元组进行排序。

iterate over the newly sorted list of tuples and fill in the three lists again. 迭代新排序的元组列表并再次填写三个列表。

For the sake of completion, a solution using numpy: 为了完成,使用numpy的解决方案:

import numpy as np

cat = [[152.123, 150.456, 151.789],
        [4.123, 3.456, 1.789],
        [20.123, 22.456, 21.789]]

cat = np.array(cat) 
cat_sorted = cat[:, cat[2].argsort()]

print cat_sorted
[[ 152.123  151.789  150.456]
 [   4.123    1.789    3.456]
 [  20.123   21.789   22.456]]

Here is another way to do it based on the great answers by Martijn Pieters and pcalcao 基于Martijn Pieters和pcalcao的精彩答案,这是另一种方法

def sort_by_last(ll):
    """
        >>> sort_by_last([[10, 20, 30], [3, 2, 1]])
        [[30, 20, 10], [1, 2, 3]]

        >>> sort_by_last([[10, 20, 30], [40, 50, 60], [3, 2, 1]])
        [[30, 20, 10], [60, 50, 40], [1, 2, 3]]

        >>> sort_by_last([[10, 20, 30], [40, 50, 60], [1, 1, 1]])
        [[10, 20, 30], [40, 50, 60], [1, 1, 1]]

        >>> sort_by_last([[10, 20, 30], [40, 50, 60], [1, 3, 1]])
        [[10, 30, 20], [40, 60, 50], [1, 1, 3]]

        >>> sort_by_last([[152.123, 150.456, 151.789], [4.123, 3.456, 1.789], [20.123, 22.456, 21.789]])
        [[152.123, 151.789, 150.456], [4.123, 1.789, 3.456], [20.123, 21.789, 22.456]]
    """
    return [sorted(x, key=lambda y: ll[-1][x.index(y)]) for x in ll]

The big string there is a docstring with doctest, to test the function copy it to a file and run it with python -m doctest -v <file> 大字符串有一个带doctest的docstring,用于测试函数将其复制到文件并使用python -m doctest -v <file>运行它

Here, keys is a sorted list of indices. 这里, keys是索引的排序列表。

keys = sorted(range(len(cat[2])), key=cat[2].__getitem__)
cat_sorted = [[cat[i][k] for k in keys] for i in range(3)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM