简体   繁体   中英

Identifying lists that have 3 elements in common in a lists of lists

I have a list of lists. If there are subslists that have the first three elements in common , merge them into one list and add all the fourth elements.

The problem is best explained in code and the required output.

a_list = [['apple', 50, 60, 7],
          ['orange', 70, 50, 8],
          ['apple', 50, 60, 12]]

# output:
# [['apple', 50, 60, 19], ['orange', 70, 50, 8]]

I already have code for a similar problem (given to me by another user in Stack Overflow some time ago), but i don't understand it completely so I'm unable to modify it accordingly. What this code does is it checks if the 0th and 2nd elements are the same, if they are, it merges the sublists, adding the 1st and 3th element:

import defaultdict
data = [['42x120x1800', 50, '50x90x800', 60],
        ['42x120x1800', 8, '50x90x800', 10],
        ['2x10x800', 5, '5x9x80', 6]]

d = defaultdict(lambda :[0, 0])
for sub_list in data:
    key = (sub_list[0], sub_list[2])
    d[key][0] += sub_list[1]
    d[key][1] += sub_list[3]

new_data = [[key[0], val[0], key[1], val[1]] for key, val in d.iteritems()]
# [['2x10x800', 5, '5x9x80', 6], ['42x120x1800', 58, '50x90x800', 70]]

How should the code be modified to fit to my new problem? I'd really appreciate if you could also take the time and explain the code thoroughly, too.

You can use the same principle, by using the first three elements as a key, and using int as the default value factory for the defaultdict (so you get 0 as the initial value):

from collections import defaultdict

a_list = [['apple', 50, 60, 7],
          ['orange', 70, 50, 8],
          ['apple', 50, 60, 12]]

d = defaultdict(int)
for sub_list in a_list:
    key = tuple(sub_list[:3])
    d[key] += sub_list[-1]

new_data = [list(k) + [v] for k, v in d.iteritems()]

If you are using Python 3, you can simplify this to:

d = defaultdict(int)
for *key, v in a_list:
    d[tuple(key)] += v

new_data = [list(k) + [v] for k, v in d.items()]

because you can use a starred target to take all 'remaining' values from a list, so each sublist is assigned mostly to key and the last value is assigned to v , making the loop just that little simpler (and there is no .iteritems() method on a dict in Python 3, because .items() is an iterator already).

So, we use a defaultdict that uses 0 as the default value, then for each key generated from the first 3 values (as a tuple so you can use it as a dictionary key) sum the last value.

  • So for the first item ['apple', 50, 60, 7] we create a key ('apple', 50, 60) , look that up in d (where it doesn't exist, but defaultdict will then use int() to create a new value of 0 ), and add the 7 from that first item.

  • Do the same for the ('orange', 70, 50) key and value 8 .

  • for the 3rd item we get the ('apple', 50, 60) key again and add 12 to the pre-existing 7 in d[('apple', 50, 60)] . for a total of 19.

Then we turn the (key, value) pairs back into lists and you are done. This results in:

>>> new_data
[['apple', 50, 60, 19], ['orange', 70, 50, 8]]

An alternative implementation that requires sorting the data uses itertools.groupby :

from itertools import groupby
from operator import itemgetter

a_list = [['apple', 50, 60, 7],
          ['orange', 70, 50, 8],
          ['apple', 50, 60, 12]]

newlist = [list(key) + [sum(i[-1] for i in sublists)] 
    for key, sublists in groupby(sorted(a_list), key=itemgetter(0, 1, 2))]

for the same output. This is going to be slower if your data isn't sorted, but it's good to know of different approaches.

I'd do something like this:

>>> a_list = [['apple', 50, 60, 7],
...           ['orange', 70, 50, 8],
...           ['apple', 50, 60, 12]]
>>> 
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> from operator import itemgetter
>>> getter = itemgetter(0,1,2)
>>> for lst in a_list:
...     d[getter(lst)].extend(lst[3:])
... 
>>> d
defaultdict(<type 'list'>, {('apple', 50, 60): [7, 12], ('orange', 70, 50): [8]})
>>> print [list(k)+v for k,v in d.items()]
[['apple', 50, 60, 7, 12], ['orange', 70, 50, 8]]

This doesn't give the sum however. It could be easily be fixed by doing:

print [list(k)+[sum(v)] for k,v in d.items()]

There isn't much of a reason to prefer this over the slightly more elegant solution by Martijn, other than it will allow the user to have an input list with more than 4 items (with the latter elements being summed as expected). In other words, this would pass the list:

a_list = [['apple', 50, 60, 7, 12],
          ['orange', 70, 50, 8]]

as well.

[:3]形成键,以便获得前3个元素。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM