简体   繁体   English

按嵌套元组值排序列表

[英]Sort list by nested tuple values

Is there a better way to sort a list by a nested tuple values than writing an itemgetter alternative that extracts the nested tuple value:有没有比编写一个提取嵌套元组值的 itemgetter 替代方案更好的方法来按嵌套元组值对列表进行排序:

def deep_get(*idx):
  def g(t):
      for i in idx: t = t[i]
      return t
  return g

>>> l = [((2,1), 1),((1,3), 1),((3,6), 1),((4,5), 2)]
>>> sorted(l, key=deep_get(0,0))
[((1, 3), 1), ((2, 1), 1), ((3, 6), 1), ((4, 5), 2)]
>>> sorted(l, key=deep_get(0,1))
[((2, 1), 1), ((1, 3), 1), ((4, 5), 2), ((3, 6), 1)]

I thought about using compose, but that's not in the standard library:我考虑过使用 compose,但这不在标准库中:

sorted(l, key=compose(itemgetter(1), itemgetter(0))

Is there something I missed in the libs that would make this code nicer?我在库中遗漏了什么可以使这段代码更好的东西吗?

The implementation should work reasonably with 100k items.实现应该与 100k 个项目合理地工作。

Context: I would like to sort a dictionary of items that are a histogram.上下文:我想对直方图的项目字典进行排序。 The keys are a tuples (a,b) and the value is the count.键是元组 (a,b),值是计数。 In the end the items should be sorted by count descending, a and b.最后,项目应按计数降序、a 和 b 排序。 An alternative is to flatten the tuple and use the itemgetter directly but this way a lot of tuples will be generated.另一种方法是展平元组并直接使用 itemgetter,但这样会生成很多元组。

Yes, you could just use a key=lambda x: x[0][1]是的,您可以只使用key=lambda x: x[0][1]

Your approach is quite good, given the data structure that you have.鉴于您拥有的数据结构,您的方法非常好。

Another approach would be to use another structure.另一种方法是使用另一种结构。

If you want speed, the de-factor standard NumPy is the way to go.如果您想要速度,则分解标准NumPy是通往 go 的方法。 Its job is to efficiently handle large arrays.它的工作是有效地处理大型 arrays。 It even has some nice sorting routines for arrays like yours.它甚至有一些像你这样的 arrays 的不错的排序例程。 Here is how you would write your sort over the counts, and then over (a, b):以下是您如何根据计数编写排序,然后再编写 (a, b):

>>> arr = numpy.array([((2,1), 1),((1,3), 1),((3,6), 1),((4,5), 2)],
                  dtype=[('pos', [('a', int), ('b', int)]), ('count', int)])
>>> print numpy.sort(arr, order=['count', 'pos'])
[((1, 3), 1) ((2, 1), 1) ((3, 6), 1) ((4, 5), 2)]

This is very fast (it's implemented in C).这非常快(它在 C 中实现)。

If you want to stick with standard Python, a list containing (count, a, b) tuples would automatically get sorted in the way you want by Python (which uses lexicographic order on tuples).如果您想坚持使用标准 Python,包含 (count, a, b) 元组的列表将自动按照您想要的方式按 Python 排序(对元组使用字典顺序)。

This might be a little faster version of your approach:这可能是您方法的一个更快的版本:

l = [((2,1), 1), ((1,3), 1), ((3,6), 1), ((4,5), 2)]

def deep_get(*idx):
    def g(t):
        return reduce(lambda t, i: t[i], idx, t)
    return g

>>> sorted(l, key=deep_get(0,1))
[((2, 1), 1), ((1, 3), 1), ((4, 5), 2), ((3, 6), 1)]

Which could be shortened to:可以缩短为:

def deep_get(*idx):
    return lambda t: reduce(lambda t, i: t[i], idx, t)

or even just simply written-out:甚至只是简单地写出:

sorted(l, key=lambda t: reduce(lambda t, i: t[i], (0,1), t))

I compared two similar solutions.我比较了两个类似的解决方案。 The first one uses a simple lambda:第一个使用简单的 lambda:

def sort_one(d):
    result = d.items()
    result.sort(key=lambda x: (-x[1], x[0]))
    return result

Note the minus on x[1] , because you want the sort to be descending on count.请注意x[1]上的减号,因为您希望排序按计数递减。

The second one takes advantage of the fact that sort in Python is stable.第二个利用了 Python 中的sort是稳定的这一事实。 First, we sort by (a, b) (ascending).首先,我们按(a, b) (升序)排序。 Then we sort by count, descending:然后我们按计数降序排序:

def sort_two(d):
    result = d.items()
    result.sort()
    result.sort(key=itemgetter(1), reverse=True)
    return result

The first one is 10-20% faster (both on small and large datasets), and both complete under 0.5sec on my Q6600 (one core used) for 100k items.第一个速度快 10-20%(在小型和大型数据集上),并且在我的 Q6600(使用一个核心)上完成 100k 个项目都在 0.5 秒内完成。 So avoiding the creation of tuples doesn't seem to help much.所以避免创建元组似乎没有多大帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM