当键值在iterable的元素中时，如何使用itertools.groupby？

Question

To illustrate, I start with a list of 2-tuples: 为了说明，我从一个2元组列表开始：

import itertools
import operator

raw = [(1, "one"),
       (2, "two"),
       (1, "one"),
       (3, "three"),
       (2, "two")]

for key, grp in itertools.groupby(raw, key=lambda item: item[0]):
    print key, list(grp).pop()[1]

yields: 收益率：

1 one
2 two
1 one
3 three
2 two

In an attempt to investigate why: 试图调查原因：

for key, grp in itertools.groupby(raw, key=lambda item: item[0]):
    print key, list(grp)

# ---- OUTPUT ----
1 [(1, 'one')]
2 [(2, 'two')]
1 [(1, 'one')]
3 [(3, 'three')]
2 [(2, 'two')]

Even this will give me the same output: 即使这样也会给我相同的输出：

for key, grp in itertools.groupby(raw, key=operator.itemgetter(0)):
    print key, list(grp)

I want to get something like: 我希望得到类似的东西：

1 one, one
2 two, two
3 three

I am thinking this is because the key is within the tuple inside the list, when in fact the tuple gets moved around as one. 我认为这是因为键是在列表中的元组内部，而实际上元组是作为一个元素移动的。 Is there a way to get to my desired output? 有没有办法达到我想要的输出？ Maybe groupby() isn't suited for this task? 也许groupby()不适合这个任务？

Answer 1

groupby clusters consecutive elements of the iterable which have the same key. groupby聚集具有相同密钥的iterable的连续元素。 To produce the output you desire, you must first sort raw . 要产生您想要的输出，您必须先对raw排序。

for key, grp in itertools.groupby(sorted(raw), key=operator.itemgetter(0)):
    print key, map(operator.itemgetter(1), grp)

# 1 ['one', 'one']
# 2 ['two', 'two']
# 3 ['three']

Answer 2

I think a cleaner way to get your desired result is this. 我认为，获得理想结果的更简洁方法就是这样。

>>> from collections import defaultdict
>>> d=defaultdict(list)
>>> for k,v in raw:
...  d[k].append(v)
... 
>>> for k,v in sorted(d.items()):
...  print k, v
... 
1 ['one', 'one']
2 ['two', 'two']
3 ['three']

building d is O(n), and now sorted() is just over the unique keys instead of the entire dataset 构建d是O（n），现在sorted()就在唯一键上而不是整个数据集上

Answer 3

From the docs : 来自文档：

The operation of groupby() is similar to the uniq filter in Unix. groupby（）的操作类似于Unix中的uniq过滤器。 It generates a break or new group every time the value of the key function changes (which is why it is usually necessary to have sorted the data using the same key function). 每次键函数的值发生变化时，它都会生成一个中断或新组（这就是为什么通常需要使用相同的键函数对数据进行排序）。 That behavior differs from SQL's GROUP BY which aggregates common elements regardless of their input order. 这种行为不同于SQL的GROUP BY，它聚合了常见元素而不管它们的输入顺序如何。

Since you are sorting the tuples lexicographically anyway, you can just call sorted : 由于您无论如何都要按字典顺序对元组进行排序，因此您可以调用sorted ：

for key, grp in itertools.groupby( sorted( raw ), key = operator.itemgetter( 0 ) ):
    print( key, list( map( operator.itemgetter( 1 ), list( grp ) ) ) )

当键值在iterable的元素中时，如何使用itertools.groupby？

问题描述

3 个解决方案

解决方案1
11 已采纳 2010-08-09 13:42:26

解决方案2
6 2010-08-09 22:30:28

解决方案3
2 2010-08-09 13:45:47

当键值在iterable的元素中时，如何使用itertools.groupby？

问题描述

3 个解决方案

解决方案1 11 已采纳 2010-08-09 13:42:26

解决方案2 6 2010-08-09 22:30:28

解决方案3 2 2010-08-09 13:45:47

解决方案1
11 已采纳 2010-08-09 13:42:26

解决方案2
6 2010-08-09 22:30:28

解决方案3
2 2010-08-09 13:45:47