简体   繁体   English

如何基于平等而不是哈希有效地从python列表中删除重复项

[英]How to efficiently remove duplicates from a python list based on equality, not hashes

We've got a list of instances of a class. 我们有一个类实例的列表。 We effectively want a Set ie a group with no repeated elements. 我们实际上想要一个Set即没有重复元素的组。 Elements of the list which are the same equate, but their hashes are different as they have been instantiated separately. 列表中的元素相同,但是它们的哈希值不同,因为它们已分别实例化。 So a==b is True , a is b is False . 所以a==bTruea is bFalse

Is there a way to vectorise this problem or otherwise make it efficient. 有没有一种方法可以向量化此问题或以其他方式使其高效。 The only solutions we can think of involved for loops, and it seems like there might be a more efficient solution. 我们可以想到的唯一涉及循环的解决方案,似乎可能会有一个更有效的解决方案。

EDIT: I think its different from the "Elegant ways to support equivalence" as the equivalence works well, its just that Set relies on comparing hashes. 编辑:我认为它与“支持等价的经典方法”不同,因为等价效果很好, Set依赖于比较哈希值。

EDIT: The for loop solution would go something like, sort the list, and then iterate over, removing the current value if its the same as the last value 编辑:for循环解决方案将类似于,对列表进行排序,然后遍历,如果当前值与最后一个值相同,则将其删除

EDIT: To be clear, we don't own this class, we just have instances of it. 编辑:要明确,我们不拥有此类,我们只有它的实例。 So we could wrap the instances and implement a more useful hash function, however, this seems like it might be almost as expensive as the for loop approach - could be wrong though 因此,我们可以包装实例并实现更有用的哈希函数,但是,这似乎和for循环方法几乎一样昂贵-尽管可能是错误的

EDIT: sorry if it feels like I'm moving the goalposts a bit here - there isn't a simple val of the object that can be subbed in for a hash, that approach would need to somehow generate UIDs for each different instance. 编辑:对不起,如果它感觉就像我有点这里挪动门柱-没有一个简单的val可埋入式在哈希的对象,这种方法将需要以某种方式为每一个不同的实例的UID。

I assume you are working with a class you created yourself and that you've implemented your own equality method. 我假设您正在使用自己创建的类并且已经实现了自己的相等方法。

It's true that the default hash method inherited from Object returns different values for different instances. 确实,从Object继承的默认哈希方法为不同的实例返回不同的值。 From what I have read, it's either based on id() or it's randomized, depending on the Python version. 根据我的阅读,它是基于id()还是随机的,具体取决于Python版本。

However, you can easily implement your own __hash__ method to solve this. 但是,您可以轻松实现自己的__hash__方法来解决此问题。

How to implement a good __hash__ function in python 如何在python中实现良好的__hash__函数

__hash__ should return the same value for objects that are equal. __hash__应该为相等的对象返回相同的值。 It also shouldn't change over the lifetime of the object; 它也不应在对象的整个生命周期内发生变化。 generally you only implement it for immutable objects. 通常,您只为不可变的对象实现它。

This may not be the answer that you want, but it is a clean and easy way to do it. 这可能不是您想要的答案,但这是一种干净且容易的方法。 Then you can just create a Set normally. 然后,您可以正常创建Set

Maybe this is what you need? 也许这就是您所需要的? Make the hash a function of the class fields. 使散列成为类字段的函数。

Here is a simple example: 这是一个简单的示例:

class A:
    def __init__(self, v):
        self.val = v

    def __eq__(self, other):
        return self.val == other.val

    def __hash__(self):
        return self.val

    def __repr__(self):
        return 'A(%s)' % self.val

a = set([A(2), A(3), A(4), A(2), A(10), A(4)])
print(a)
# {A(10), A(2), A(3), A(4)}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM