Itertools groupby 与 lambda 函数，如果列表的子列表在索引 0 和 1 处具有匹配值，则将它们组合在一起

Question

I have a list of lists like this:我有一个像这样的列表：

data = [['a', 'b', 2000, 100], ['a', 'b', 4000, 500], ['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000], ['a', 'd', 2000, 100], ['a', 'd', 1000, 100]]

and I want to group them together if they have the same first two values.如果它们的前两个值相同，我想将它们组合在一起。 Output would be:输出将是：

data = [(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]), (['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]), (['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]

The sublists with the same first two values are always adjacent to each other in list, but they vary in the number of how many I need to group.具有相同前两个值的子列表在列表中总是彼此相邻，但它们在我需要分组的数量上有所不同。

I tried this:我试过这个：

from itertools import groupby
data = [['a', 'b', 2000, 100], ['a', 'b', 4000, 500], ['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000], ['a', 'd', 2000, 100], ['a', 'd', 1000, 100]]
output = [list(group) for key, group in groupby(data, lambda x:x[0])]

new_data = []
for l in output:
    new_output = [tuple(group) for key, group in groupby(l, lambda x:x[1])]
    for grouped_sub in new_output:
        new_data.append(grouped_sub)

print(new_data)

and got the output:并得到输出：

[(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]), (['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]), (['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]

Which is exactly what I was looking for.这正是我一直在寻找的。 However, my list of lists is len(data) = 1000000 and I know this could be much more efficient if I could skip the for loops entirely and somehow get the groupby lambda to consider both x[0] and x[1] when grouping.. but I do not really understand how lambda functions in groupby work all too well yet.但是，我的列表列表是len(data) = 1000000 ，我知道如果我可以完全跳过 for 循环并以某种方式让groupby lambda在分组时同时考虑x[0]和x[1] ，这可能会更有效率.. 但我并不真正理解groupby lambda函数如何工作得很好。

Answer 1

Modify the key lambda to return a tuple containing both elements:修改键 lambda 以返回包含两个元素的元组：

groupby(data, lambda x: tuple(x[0:2]))

ie can be done in a single for-loop / list comprehension:即可以在单个 for 循环/列表理解中完成：

>>> [tuple(group) for key, group in groupby(data, lambda x: tuple(x[0:2]))]
[(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]), 
 (['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]), 
 (['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]

Answer 2

Why not just group by first 2 items directly:为什么不直接按前 2 项分组：

from itertools import groupby

data = [['a', 'b', 2000, 100], ['a', 'b', 4000, 500], ['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000], ['a', 'd', 2000, 100], ['a', 'd', 1000, 100]]
res = [tuple(g) for k, g in groupby(data, key=lambda x: x[:2])]
print(res)

The output:输出：

[(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]), (['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]), (['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]

Itertools groupby 与 lambda 函数，如果列表的子列表在索引 0 和 1 处具有匹配值，则将它们组合在一起

问题描述

2 个解决方案

解决方案1
6 2019-08-21 15:24:49

解决方案2
3 已采纳 2019-08-21 15:25:13

Itertools groupby 与 lambda 函数，如果列表的子列表在索引 0 和 1 处具有匹配值，则将它们组合在一起

问题描述

2 个解决方案

解决方案1 6 2019-08-21 15:24:49

解决方案2 3 已采纳 2019-08-21 15:25:13

解决方案1
6 2019-08-21 15:24:49

解决方案2
3 已采纳 2019-08-21 15:25:13