从python中的元组列表中删除唯一的元组

Question

I'm writing a program that find duplicated files, and right now I have a list of tuples as 我正在编写一个程序来查找重复的文件，现在我有一个元组列表作为

mylist = [(file1, size1, hash1),
          (file2, size2, hash2),
          ...
          (fileN, sizeN, hashN)]

I want to remove the entries that have a unique hash, leaving only the duplicates. 我想删除具有唯一哈希的条目，仅保留重复项。 I'm using 我正在使用

def dropunique(mylist):
templist = []
while mylist:
    mycandidate = mylist.pop()
    templist.append([mycandidate])
    for myfile in mylist:
        if myfile[-1] == mycandidate[-1]:
            templist[-1].append(myfile)
            mylist.remove(myfile)
for myfile in templist:
    if len(myfile) != 1:
        mylist.append(myfile)
templist = [item for sublist in mylist for item in sublist]
return templist

where I pop an entry, look if there is other one with the same hash and group then in a list of list with the same hash. 在我弹出条目的地方，查看是否还有其他具有相同哈希和组的条目，然后在具有相同哈希的列表列表中。 Then I make another list just with the sublists with len > 1 and flat the resulting list of lists into a simple list. 然后，我用len> 1的子列表创建另一个列表，并将得到的列表列表放到一个简单列表中。 My problem is that when I remove an entry from a list while using 'for myfile in mylist:' on the some list, it jumps same entries and live then behind. 我的问题是，当我在某些列表上使用“ for mylist in mylist：”时从列表中删除某个条目时，它会跳出相同的条目并在后面存活。

Answer 1

Copy your list in a dictionary where the hash is the key, and on a second pass remove those with a single count - you can even use collections.Counter to spare one or two lines of code: 将您的列表复制到以哈希为键的字典中，然后第二遍删除具有单个计数的那些-您甚至可以使用collections.Counter来保留一两行代码：

from collections import Counter

counter = Counter(t[2] for t in list_)

result = [value for value in list_ if counter[value[2]] > 1]

(Non-related tip: avoid naming your variables as "list" or "dict" - that overrides Python default built-ins for those) （无关的技巧：避免将变量命名为“列表”或“ dict”-会覆盖这些变量的Python默认内置函数）

Answer 2

I would use a defaultdict() to group the tuples by their hashvalue: 我将使用defaultdict（）将元组按其哈希值分组：

from collections import defaultdict

# Group the tuples by their hashvalues
d = defaultdict(list)
for tup in data:
    filename, size, hashvalue = tup
    d[hash].append(tup)

# Display groups of tuples that have more than one tuple
for hashvalue, tuples in d.items():
    if len(tuples) > 1:
        print('Tuples with %r in common' % hashvalue)
        for tup in tuples:
            print(tup)
        print()

Answer 3

Solution using groupby 使用groupby的解决方案

from itertools import groupby

my_list = [(1, 2, 3),
           (1, 2, 3),
           (4, 5, 6)]


vals = []

for hash_val, items in groupby(sorted(my_list), hash):
    results = tuple(items)
    if len(results) > 1:
        vals.append(results[0])

Answer 4

You can use double filter like this: 您可以像这样使用双重filter ：

filter(lambda el: len(filter(lambda item: item[2] == el[2], my_list)) > 1, my_list)

Result: 结果：

>>> my_list = [('file1', 'size1', 'hash1'), ('file2', 'size2', 'hash2'), ('file3', 'size3', 'hash3'), ('file4', 'size4', 'hash2')]
>>>
>>> filter(lambda el: len(filter(lambda item: item[2] == el[2], my_list)) > 1, my_list)
[('file2', 'size2', 'hash2'), ('file4', 'size4', 'hash2')]

Note that in Python 3, filter returns an iterator, so you'll need to convert it to a list like this: list(filter(...)) 请注意，在Python 3中， filter返回一个迭代器，因此您需要将其转换为这样的列表： list(filter(...))

从python中的元组列表中删除唯一的元组

问题描述

4 个解决方案

解决方案1
3 已采纳 2017-05-10 14:29:02

解决方案2
1 2017-05-10 14:29:49

解决方案3
1 2017-05-10 14:33:45

解决方案4
0 2017-05-10 14:46:58

从python中的元组列表中删除唯一的元组

问题描述

4 个解决方案

解决方案1 3 已采纳 2017-05-10 14:29:02

解决方案2 1 2017-05-10 14:29:49

解决方案3 1 2017-05-10 14:33:45

解决方案4 0 2017-05-10 14:46:58

解决方案1
3 已采纳 2017-05-10 14:29:02

解决方案2
1 2017-05-10 14:29:49

解决方案3
1 2017-05-10 14:33:45

解决方案4
0 2017-05-10 14:46:58