简体   繁体   English

我需要从列表中删除重复项,但要在其中添加数值

[英]I need to remove duplicates from a list but add the numeric value in them

I have a list that looks like this: 我有一个看起来像这样的列表:

[('A54', 'ItemName1 ', '18'), ('B52', 'ItemName2 ', '51'), ('C45', 'ItemName3 ', '3'), ('A54', ' ItemName1', '15'), ('G22', ' ItemName5, '78')]

The first item in each list represents an item number, the second one is the item name and the third one is the quantity. 每个列表中的第一个项目代表一个项目编号,第二个项目代表项目名称,第三个项目是数量。

What would be the best way to remove duplicate instances from the list while adding the total quantity of items to them? 从列表中删除重复的实例,同时向它们添加项目总数的最佳方法是什么?

I've tried sorting the list by alphabetical order using list() but for some reason, it doesn't work. 我尝试使用list()按字母顺序对列表进行排序,但是由于某些原因,它不起作用。

My sorting attempt looks like this: 我的排序尝试如下所示:

L = [('A54', 'ItemName1 ', '18'), ('B52', 'ItemName2 ', '51'), ('C45', 'ItemName3 ', '3'), ('A54', ' ItemName1', '15'), ('G22', ' ItemName5', '78')]
L.sort()

print (L)

The result is always None . 结果始终为None

you're probably doing L = L.sort() ... which explains the None result (classical issue Why does "return list.sort()" return None, not the list? ) 您可能正在执行L = L.sort() ...,这说明了None结果(经典问题为什么“ return list.sort()”返回None,而不是list?

Anyway, sorting+grouping (for instance by using itertools.groupby ) isn't the best way. 无论如何,排序+分组(例如,通过使用itertools.groupby )不是最佳方法。 Bad complexity: O(n*log(n)) + O(n) 不好的复杂度: O(n*log(n)) + O(n)

Instead, create a collections.defaultdict and "count" your items ( collections.Counter doesn't work here as the count depends from the value of the third argument converted as integer). 而是创建一个collections.defaultdict并对项目进行“计数”( collections.Counter在这里不起作用,因为count取决于转换为整数的第三个参数的值)。

Then rebuild the triplets by unpacking the dictionary keys & values. 然后通过解开字典键和值来重建三胞胎。

import collections

L = [('A54', 'ItemName1', '18'), ('B52', 'ItemName2', '51'),('C45', 'ItemName3', '3'),('A54', 'ItemName1', '15'), ('G22', 'ItemName5', '78')]

d = collections.defaultdict(int)
for a,b,c in L:
    d[a,b] += int(c)

newlist = [(a,b,c) for (a,b),c in d.items()]

result: 结果:

>>> newlist
[('B52', 'ItemName2', 51),
 ('C45', 'ItemName3', 3),
 ('A54', 'ItemName1', 33),
 ('G22', 'ItemName5', 78)]
>>> 

complexity is then O(n) 那么复杂度为O(n)

Note that your original data seems to contain trailing/leading spaces. 请注意,您的原始数据似乎包含尾随/前导空格。 Not an issue to strip them when creating the new dictionary (else grouping would not work), for instance like: 创建新字典时剥离它们不是问题(否则无法分组),例如:

d[a,b.strip()] += int(c)

I think it might be a good idea to implement a dictionary, since you seem to be regarding the first item of each tuple as a key. 我认为实现字典可能是一个好主意,因为您似乎将每个元组的第一项都作为键。 I personally would sort them like this 我个人会这样排序

from collections import OrderedDict

L = [('A54', 'ItemName1 ', '18'), ('B52', 'ItemName2 ', '51'), ('C45', 'ItemName3 ', '3'), ('A54', ' ItemName1', '15'), ('G22', ' ItemName5', '78')]

sorted_L = OrderedDict()
for item in L:
    if item[0] in sorted_L.keys():
        sorted_L[item[0]] += int(item[2])
    else:
        sorted_L[item[0]] = int(item[2])

print(sorted_L)

Which results in 导致

OrderedDict([('A54', 33), ('B52', 51), ('C45', 3), ('G22', 78)]) OrderedDict([[('A54',33),('B52',51),('C45',3),('G22',78)])

But maintains the order of your list, by using an OrderedDict instead of a normal dictionary. 但是通过使用OrderedDict而不是普通的字典来维护列表的顺序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM