简体   繁体   中英

I need to remove duplicates from a list but add the numeric value in them

I have a list that looks like this:

[('A54', 'ItemName1 ', '18'), ('B52', 'ItemName2 ', '51'), ('C45', 'ItemName3 ', '3'), ('A54', ' ItemName1', '15'), ('G22', ' ItemName5, '78')]

The first item in each list represents an item number, the second one is the item name and the third one is the quantity.

What would be the best way to remove duplicate instances from the list while adding the total quantity of items to them?

I've tried sorting the list by alphabetical order using list() but for some reason, it doesn't work.

My sorting attempt looks like this:

L = [('A54', 'ItemName1 ', '18'), ('B52', 'ItemName2 ', '51'), ('C45', 'ItemName3 ', '3'), ('A54', ' ItemName1', '15'), ('G22', ' ItemName5', '78')]
L.sort()

print (L)

The result is always None .

you're probably doing L = L.sort() ... which explains the None result (classical issue Why does "return list.sort()" return None, not the list? )

Anyway, sorting+grouping (for instance by using itertools.groupby ) isn't the best way. Bad complexity: O(n*log(n)) + O(n)

Instead, create a collections.defaultdict and "count" your items ( collections.Counter doesn't work here as the count depends from the value of the third argument converted as integer).

Then rebuild the triplets by unpacking the dictionary keys & values.

import collections

L = [('A54', 'ItemName1', '18'), ('B52', 'ItemName2', '51'),('C45', 'ItemName3', '3'),('A54', 'ItemName1', '15'), ('G22', 'ItemName5', '78')]

d = collections.defaultdict(int)
for a,b,c in L:
    d[a,b] += int(c)

newlist = [(a,b,c) for (a,b),c in d.items()]

result:

>>> newlist
[('B52', 'ItemName2', 51),
 ('C45', 'ItemName3', 3),
 ('A54', 'ItemName1', 33),
 ('G22', 'ItemName5', 78)]
>>> 

complexity is then O(n)

Note that your original data seems to contain trailing/leading spaces. Not an issue to strip them when creating the new dictionary (else grouping would not work), for instance like:

d[a,b.strip()] += int(c)

I think it might be a good idea to implement a dictionary, since you seem to be regarding the first item of each tuple as a key. I personally would sort them like this

from collections import OrderedDict

L = [('A54', 'ItemName1 ', '18'), ('B52', 'ItemName2 ', '51'), ('C45', 'ItemName3 ', '3'), ('A54', ' ItemName1', '15'), ('G22', ' ItemName5', '78')]

sorted_L = OrderedDict()
for item in L:
    if item[0] in sorted_L.keys():
        sorted_L[item[0]] += int(item[2])
    else:
        sorted_L[item[0]] = int(item[2])

print(sorted_L)

Which results in

OrderedDict([('A54', 33), ('B52', 51), ('C45', 3), ('G22', 78)])

But maintains the order of your list, by using an OrderedDict instead of a normal dictionary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM