简体   繁体   中英

Python: how to group similar lists together in a list of lists?

I have a list of lists in python. I want to group similar lists together. That is, if first three elements of each list are the same then those three lists should go in one group. For eg

[["a", "b", "c", 1, 2],

["d", "f", "g", 8, 9],

["a", "b", "c", 3, 4],

["d","f", "g", 3, 4],

["a", "b", "c", 5, 6]]

I want this to look like

[[["a", "b", "c", 1, 2],

["a", "b", "c", 5, 6],

["a", "b", "c", 3, 4]],

[["d","f", "g", 3, 4],

["d", "f", "g", 8, 9]]]

I could do this by running an iterator and manually comparing each element of two consecutive lists and then based on the no of elements within those lists that were same I can decide to group them together. But i was just wondering if there is any other way or a pythonic way to do this.

You can use itertools.groupby :

>>> A=[["a", "b", "c", 1, 2],
...    ["d", "f", "g", 8, 9],
...    ["a", "b", "c", 3, 4],
...    ["d","f", "g", 3, 4],
...    ["a", "b", "c", 5, 6]]
>>> from operator import itemgetter
>>> [list(g) for _,g in groupby(sorted(A),itemgetter(0,1,2)]
[[['a', 'b', 'c', 1, 2], ['a', 'b', 'c', 3, 4], ['a', 'b', 'c', 5, 6]], [['d', 'f', 'g', 3, 4], ['d', 'f', 'g', 8, 9]]] 

You don't need to sort, you can group in a dict using a tuple of the first three elements from each list as the key:

from collections import OrderedDict

l=[    
  ["a", "b", "c", 1, 2],    
  ["d", "f", "g", 8, 9],    
  ["a", "b", "c", 3, 4],    
  ["d","f", "g", 3, 4],
  ["a", "b", "c", 5, 6]    
]

od = OrderedDict()
for sub in l:
    k = tuple(sub[:3])
    od.setdefault(k,[]).append(sub)
from pprint import pprint as pp

pp(od.values())
[[['a', 'b', 'c', 1, 2], ['a', 'b', 'c', 3, 4], ['a', 'b', 'c', 5, 6]],
[['d', 'f', 'g', 8, 9], ['d', 'f', 'g', 3, 4]]]

Which is O(n) as opposed to O(n log n) .

If you don't care about order use a defaultdict:

from collections import defaultdict


od = defaultdict(list)
for sub in l:
    a, b, c, *_ = sub # python3
    k = a,b,c
    od[k].append(sub)

from pprint import pprint as pp

pp(list(od.values()))
[[['a', 'b', 'c', 1, 2], ['a', 'b', 'c', 3, 4], ['a', 'b', 'c', 5, 6]],
 [['d', 'f', 'g', 8, 9], ['d', 'f', 'g', 3, 4]]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM