简体   繁体   English

如何在实例列表中删除“重复项”

[英]How to remove `duplicates' in list of instances

I have a list of instances of a certain class. 我有某个类的实例列表。 This list contains `duplicates', in the sense that duplicates share the exact same attributes. 该列表包含“重复项”,即重复项具有完全相同的属性。 I want to remove the duplicates from this list. 我想从此列表中删除重复项。

I can check whether two instances share the same attributes by using 我可以通过使用以下命令检查两个实例是否共享相同的属性

class MyClass:

    def __eq__(self, other) : 
        return self.__dict__ == other.__dict__

I could of course iterate through the whole list of instances and compare them element by element to remove duplicates, but I was wondering if there is a more pythonic way to do this, preferably using the in operator + list comprehension. 我当然可以遍历整个实例列表,然后逐个元素地比较它们以删除重复项,但是我想知道是否还有更Python的方法来做到这一点,最好使用in运算符+列表理解。

set s (no order) set s(无顺序)

A set cannot contain duplicate elements. 集合不能包含重复的元素。 list(set(content)) will deduplicate a list. list(set(content))将删除列表中的重复数据。 This is not too inefficient and is probably one of the better ways to do it :P You will need to define a __hash__ function for your class though, which must be the same for equal elements and different for unequal elements for this to work. 这并不是效率太低,并且可能是实现它的更好方法之一:P但是,您将需要为类定义__hash__函数,该函数对于相等的元素必须相同,对于不相等的元素必须不同,这样才能起作用。 Note that the hash value must obey the aforementioned rule but otherwise it may change between runs without causing issues. 请注意, hash值必须遵守上述规则,但是在运行之间可能会发生变化而不会引起问题。

index function (stable order) index功能(稳定顺序)

You could do lambda l: [l[index] for index in range(len(l)) if index == l.index(l[index])] . 您可以执行lambda l: [l[index] for index in range(len(l)) if index == l.index(l[index])] This only keeps elements that are the first in the list. 这只会保留列表中的第一个元素。

in operator (stable order) in运算符中(稳定的顺序)

def uniquify(content):
    result = []
    for element in content:
        if element not in result:
            result.append(element)
    return result

This will keep appending elements to the output list unless they are already in the output list. 除非将元素添加到输出列表中,否则它将继续将元素添加到输出列表中。

A little more on the set approach. 关于固定方法的更多信息。 You can safely implement a hash by delegating to a tuple's hash - just hash a tuple of all the attributes you want to look at. 您可以通过委派给元组的哈希来安全地实现哈希-只需对要查看的所有属性的元组进行哈希处理即可。 You will also need to define an __eq__ that behaves properly. 您还需要定义行为正确的__eq__

class MyClass:
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

    def __eq__(self, other):
        return (self.a, self.b, self.c) == (other.a, other.b, other.c)

    def __hash__(self):
        return hash((self.a, self.b, self.c))

    def __repr__(self):
        return "MyClass({!r}, {!r}, {!r})".format(self.a, self.b, self.c)

As you're doing so much tuple construction, you could just make your class iterable: 当您进行大量的元组构造时,您可以使您的类可迭代:

def __iter__(self):
    return iter((self.a, self.b, self.c))

This enables you to call tuple on self instead of laboriously doing .a, .b, .c etc. 这使您可以对self调用tuple ,而不必费力地执行.a, .b, .c等。

You can then do something like this: 然后,您可以执行以下操作:

def unordered_elim(l):
    return list(set(l))

If you want to preserve ordering, you can use an OrderedDict instead: 如果要保留顺序,可以改用OrderedDict

from collections import OrderedDict

def ordered_elim(l):
    return list(OrderedDict.fromkeys(l).keys())

This should be faster than using in or index , while still preserving ordering. 这应该比使用inindex更快,同时仍保留顺序。 You can test it something like this: 您可以像这样测试它:

data = [MyClass("this", "is a", "duplicate"),
        MyClass("first", "unique", "datum"),
        MyClass("this", "is a", "duplicate"),
        MyClass("second", "unique", "datum")]

print(unordered_elim(data))
print(ordered_elim(data))

With this output: 输出如下:

[MyClass('first', 'unique', 'datum'), MyClass('second', 'unique', 'datum'), MyClass('this', 'is a', 'duplicate')]
[MyClass('this', 'is a', 'duplicate'), MyClass('first', 'unique', 'datum'), MyClass('second', 'unique', 'datum')]

NB if any of your attributes aren't hashable, this won't work, and you'll either need to work around it (change a list to a tuple) or use a slow, n ^ 2 approach like in . 注意:如果您的任何属性都不是可哈希的,那么它将无法正常工作,您要么需要解决它(将列表更改为元组),要么使用像in那样的慢速n ^ 2方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM