简体   繁体   中英

How to efficiently remove duplicates from a python list based on equality, not hashes

We've got a list of instances of a class. We effectively want a Set ie a group with no repeated elements. Elements of the list which are the same equate, but their hashes are different as they have been instantiated separately. So a==b is True , a is b is False .

Is there a way to vectorise this problem or otherwise make it efficient. The only solutions we can think of involved for loops, and it seems like there might be a more efficient solution.

EDIT: I think its different from the "Elegant ways to support equivalence" as the equivalence works well, its just that Set relies on comparing hashes.

EDIT: The for loop solution would go something like, sort the list, and then iterate over, removing the current value if its the same as the last value

EDIT: To be clear, we don't own this class, we just have instances of it. So we could wrap the instances and implement a more useful hash function, however, this seems like it might be almost as expensive as the for loop approach - could be wrong though

EDIT: sorry if it feels like I'm moving the goalposts a bit here - there isn't a simple val of the object that can be subbed in for a hash, that approach would need to somehow generate UIDs for each different instance.

I assume you are working with a class you created yourself and that you've implemented your own equality method.

It's true that the default hash method inherited from Object returns different values for different instances. From what I have read, it's either based on id() or it's randomized, depending on the Python version.

However, you can easily implement your own __hash__ method to solve this.

How to implement a good __hash__ function in python

__hash__ should return the same value for objects that are equal. It also shouldn't change over the lifetime of the object; generally you only implement it for immutable objects.

This may not be the answer that you want, but it is a clean and easy way to do it. Then you can just create a Set normally.

Maybe this is what you need? Make the hash a function of the class fields.

Here is a simple example:

class A:
    def __init__(self, v):
        self.val = v

    def __eq__(self, other):
        return self.val == other.val

    def __hash__(self):
        return self.val

    def __repr__(self):
        return 'A(%s)' % self.val

a = set([A(2), A(3), A(4), A(2), A(10), A(4)])
print(a)
# {A(10), A(2), A(3), A(4)}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM