[英]Itertools groupby with lambda function, group sublists of a list together if they have matching values at indices 0 and 1
I have a list of lists like this:我有一个像这样的列表:
data = [['a', 'b', 2000, 100], ['a', 'b', 4000, 500], ['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000], ['a', 'd', 2000, 100], ['a', 'd', 1000, 100]]
and I want to group them together if they have the same first two values.如果它们的前两个值相同,我想将它们组合在一起。 Output would be:
输出将是:
data = [(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]), (['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]), (['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]
The sublists with the same first two values are always adjacent to each other in list, but they vary in the number of how many I need to group.具有相同前两个值的子列表在列表中总是彼此相邻,但它们在我需要分组的数量上有所不同。
I tried this:我试过这个:
from itertools import groupby
data = [['a', 'b', 2000, 100], ['a', 'b', 4000, 500], ['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000], ['a', 'd', 2000, 100], ['a', 'd', 1000, 100]]
output = [list(group) for key, group in groupby(data, lambda x:x[0])]
new_data = []
for l in output:
new_output = [tuple(group) for key, group in groupby(l, lambda x:x[1])]
for grouped_sub in new_output:
new_data.append(grouped_sub)
print(new_data)
and got the output:并得到输出:
[(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]), (['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]), (['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]
Which is exactly what I was looking for.这正是我一直在寻找的。 However, my list of lists is
len(data) = 1000000
and I know this could be much more efficient if I could skip the for loops entirely and somehow get the groupby
lambda
to consider both x[0]
and x[1]
when grouping.. but I do not really understand how lambda
functions in groupby
work all too well yet.但是,我的列表列表是
len(data) = 1000000
,我知道如果我可以完全跳过 for 循环并以某种方式让groupby
lambda
在分组时同时考虑x[0]
和x[1]
,这可能会更有效率.. 但我并不真正理解groupby
lambda
函数如何工作得很好。
Modify the key lambda to return a tuple containing both elements:修改键 lambda 以返回包含两个元素的元组:
groupby(data, lambda x: tuple(x[0:2]))
ie can be done in a single for-loop / list comprehension:即可以在单个 for 循环/列表理解中完成:
>>> [tuple(group) for key, group in groupby(data, lambda x: tuple(x[0:2]))]
[(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]),
(['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]),
(['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]
Why not just group by first 2 items directly:为什么不直接按前 2 项分组:
from itertools import groupby
data = [['a', 'b', 2000, 100], ['a', 'b', 4000, 500], ['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000], ['a', 'd', 2000, 100], ['a', 'd', 1000, 100]]
res = [tuple(g) for k, g in groupby(data, key=lambda x: x[:2])]
print(res)
The output:输出:
[(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]), (['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]), (['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.