I have a list of lists like this:
data = [['a', 'b', 2000, 100], ['a', 'b', 4000, 500], ['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000], ['a', 'd', 2000, 100], ['a', 'd', 1000, 100]]
and I want to group them together if they have the same first two values. Output would be:
data = [(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]), (['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]), (['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]
The sublists with the same first two values are always adjacent to each other in list, but they vary in the number of how many I need to group.
I tried this:
from itertools import groupby
data = [['a', 'b', 2000, 100], ['a', 'b', 4000, 500], ['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000], ['a', 'd', 2000, 100], ['a', 'd', 1000, 100]]
output = [list(group) for key, group in groupby(data, lambda x:x[0])]
new_data = []
for l in output:
new_output = [tuple(group) for key, group in groupby(l, lambda x:x[1])]
for grouped_sub in new_output:
new_data.append(grouped_sub)
print(new_data)
and got the output:
[(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]), (['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]), (['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]
Which is exactly what I was looking for. However, my list of lists is len(data) = 1000000
and I know this could be much more efficient if I could skip the for loops entirely and somehow get the groupby
lambda
to consider both x[0]
and x[1]
when grouping.. but I do not really understand how lambda
functions in groupby
work all too well yet.
Modify the key lambda to return a tuple containing both elements:
groupby(data, lambda x: tuple(x[0:2]))
ie can be done in a single for-loop / list comprehension:
>>> [tuple(group) for key, group in groupby(data, lambda x: tuple(x[0:2]))]
[(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]),
(['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]),
(['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]
Why not just group by first 2 items directly:
from itertools import groupby
data = [['a', 'b', 2000, 100], ['a', 'b', 4000, 500], ['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000], ['a', 'd', 2000, 100], ['a', 'd', 1000, 100]]
res = [tuple(g) for k, g in groupby(data, key=lambda x: x[:2])]
print(res)
The output:
[(['a', 'b', 2000, 100], ['a', 'b', 4000, 500]), (['c', 'd', 500, 8000], ['c', 'd', 60, 8000], ['c', 'd', 70, 1000]), (['a', 'd', 2000, 100], ['a', 'd', 1000, 100])]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.