[英]Itertools groupby with lambda function, group sublists of a list together if they have matching values at indices 0 and 1
我有一個像這樣的列表:
data = [['a', 'b', 2000, 100], ['a', 'b', 4000, 500], ['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000], ['a', 'd', 2000, 100], ['a', 'd', 1000, 100]]
如果它們的前兩個值相同,我想將它們組合在一起。 輸出將是:
data = [(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]), (['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]), (['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]
具有相同前兩個值的子列表在列表中總是彼此相鄰,但它們在我需要分組的數量上有所不同。
我試過這個:
from itertools import groupby
data = [['a', 'b', 2000, 100], ['a', 'b', 4000, 500], ['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000], ['a', 'd', 2000, 100], ['a', 'd', 1000, 100]]
output = [list(group) for key, group in groupby(data, lambda x:x[0])]
new_data = []
for l in output:
new_output = [tuple(group) for key, group in groupby(l, lambda x:x[1])]
for grouped_sub in new_output:
new_data.append(grouped_sub)
print(new_data)
並得到輸出:
[(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]), (['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]), (['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]
這正是我一直在尋找的。 但是,我的列表列表是len(data) = 1000000
,我知道如果我可以完全跳過 for 循環並以某種方式讓groupby
lambda
在分組時同時考慮x[0]
和x[1]
,這可能會更有效率.. 但我並不真正理解groupby
lambda
函數如何工作得很好。
修改鍵 lambda 以返回包含兩個元素的元組:
groupby(data, lambda x: tuple(x[0:2]))
即可以在單個 for 循環/列表理解中完成:
>>> [tuple(group) for key, group in groupby(data, lambda x: tuple(x[0:2]))]
[(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]),
(['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]),
(['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]
為什么不直接按前 2 項分組:
from itertools import groupby
data = [['a', 'b', 2000, 100], ['a', 'b', 4000, 500], ['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000], ['a', 'd', 2000, 100], ['a', 'd', 1000, 100]]
res = [tuple(g) for k, g in groupby(data, key=lambda x: x[:2])]
print(res)
輸出:
[(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]), (['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]), (['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.