[英]Python: Count and Remove duplicates in the list of list
我有一个列表列表:
a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
[1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
[5.0, 5.0, 5.0],
[1.0]
]
a= set(a)
我需要做的是删除列表列表中的所有重复项并保留之前的序列。 此外,我需要计算列表中每个重复项的数量。 如
删除重复项后的列表:
a = [[1.0],
[2.0, 3.0, 4.0],
[3.0, 5.0],
[1.0, 4.0, 5.0],
[5.0],
[1.0]
]
列表列表中重复项的计数
b = [[13],
[6, 5, 4],
[8, 3],
[1, 3, 3],
[3],
[1]
]
我的代码:
for index, lst in enumerate(a):
seen = set()
a[index] = [i for i in lst if i not in seen and seen.add(i) is None]
您可以使用itertools.groupby
:
from itertools import groupby
a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
[1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
[5.0, 5.0, 5.0],
[1.0]
]
b = []
c = []
for inner in a:
new_b = []
new_c = []
for value, repeated in groupby(sorted(inner)):
new_b.append(value)
new_c.append(sum(1 for _ in repeated))
b.append(new_b)
c.append(new_c)
print b
# [[1.0], [2.0, 3.0, 4.0], [3.0, 5.0], [1.0, 4.0, 5.0], [5.0], [1.0]]
print c
# [[13], [6, 5, 4], [8, 3], [1, 3, 3], [3], [1]]
使用collections.Counter()
from collections import Counter
a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
[1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
[5.0, 5.0, 5.0],
[1.0]
]
dic_count = [ Counter(x) for x in a]
print dic_count
'''
[
Counter({1.0: 13}),
Counter({2.0: 6, 3.0: 5, 4.0: 4}),
Counter({3.0: 8, 5.0: 3}),
Counter({4.0: 3, 5.0: 3, 1.0: 1}),
Counter({5.0: 3}),
Counter({1.0: 1})
]
'''
print [ x.keys() for x in dic_count ]
'''
[
[1.0],
[2.0, 3.0, 4.0],
[3.0, 5.0],
[1.0, 4.0, 5.0],
[5.0],
[1.0]
]
'''
print [ x.values() for x in dic_count ]
'''
[
[13],
[6, 5, 4],
[8, 3],
[1, 3, 3],
[3],
[1]
]
'''
这是有效的:
b = [list(set(x)) for x in a]
c = [[a[ind].count(x) for x in ele] for ind, ele in enumerate(b)]
50 个子列表的时间安排:
In [8]: %%timeit
...: b = []
...: c = []
...: for inner in a:
...: new_b = []
...: new_c = []
...: for value, repeated in groupby(sorted(inner)):
...: new_b.append(value)
...: new_c.append(sum(1 for _ in repeated))
...: b.append(new_b)
...: c.append(new_c)
...:
10 loops, best of 3: 20.4 ms per loop
In [9]: %%timeit
dic_count = [ Counter(x) for x in a]
[ x.keys() for x in dic_count ]
[ x.values() for x in dic_count ]
...:
10 loops, best of 3: 39.1 ms per loop
In [10]: %%timeit
b = [list(set(x)) for x in a]
c = [a[ind].count(x) for x in ele]for ind, ele in enumerate(b)]
....:
100 loops, best of 3: 7.95 ms per loop
嗨,你可能不应该使用这段代码(我只是在玩一些我还没有尝试过的新功能)但这会让你得到你想要的输出......
from collections import Counter
from itertools import *
vals = zip(*(izip(*izip(row.keys(),row.values())) for row in (dict(Counter(each)) for each in a)))
print vals[0],"\n", vals[1]
如果我是你,我会解决这个问题......
[dict(Counter(each)) for each in a]
非常干净的输出,比我的解决方案更具可读性
我最近不得不开发类似的东西。 我的解决方案是遍历列表并创建一个数组,该数组具有该值以及原始列表包含的值的数量。
def count_duplicates(input_list):
count_list = []
for each in input_list:
new_count = [each, input_list.count(each)]
if count_list.count(new_count) >= 1:
continue
else:
count_list.append(new_count)
return count_list
通过在 for-each 循环内运行上述函数并设置一个与列表列表相等的新列表,您可以制作一个包含您需要的所有内容的输出。
没有必要走极端去发现这一点,它可以用简单的数学来完成。
the_list = [34, 40, 17, 6, 6, 48, 35, 8, 23, 41, 3, 36, 14, 44, 4, 46, 13, 26, 8, 41, 48, 39, 3, 43, 7, 20, 44, 17, 14, 18, 4, 3, 38, 42, 4, 19, 50, 38, 19, 40, 3, 26, 33, 26, 47, 46, 30, 12, 28, 32]
print(len(the_list) - len(list(set(the_list))))
附评论:
# list with duplicates
the_list = [34, 40, 17, 6, 6, 48, 35, 8, 23, 41, 3, 36, 14, 44, 4, 46, 13, 26, 8, 41, 48, 39, 3, 43, 7, 20, 44, 17, 14, 18, 4, 3, 38, 42, 4, 19, 50, 38, 19, 40, 3, 26, 33, 26, 47, 46, 30, 12, 28, 32]
# in actual lists where you don't know the amount of items,
# determine the amount with len()
list_size = len(the_list)
# remove the duplicates using set(),
# since there was no mention of converting
# we'll also convert back to list()
the_list = list(set(the_list))
# how many duplicates?
duplicates = list_size - len(the_list)
print(f"Total items in list: {list_size}")
print(f"Number of duplicates removed: {duplicates}")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.