简体   繁体   中英

Implementing GetHashCode for IEqualityComparer<T> with conditional equality

I'm wondering if anyone as any suggestions for this problem.

I'm using intersect and except (Linq) with a custom IEqualityComparer in order to query the set differences and set intersections of two sequences of ISyncableUsers.

public interface ISyncableUser
{
    string Guid { get; }
    string UserPrincipalName { get; }
}

The logic behind whether two ISyncableUsers are equal is conditional. The conditions center around whether either of the two properties, Guid and UserPrincipalName, have values. The best way to explain this logic is with code. Below is my implementation of the Equals method of my customer IEqualityComparer.

public bool Equals(ISyncableUser userA, ISyncableUser userB)
{
    if (userA == null && userB == null)
    {
        return true;
    }

    if (userA == null)
    {
        return false;
    }

    if (userB == null)
    {
        return false;
    }

    if ((!string.IsNullOrWhiteSpace(userA.Guid) && !string.IsNullOrWhiteSpace(userB.Guid)) &&
        userA.Guid == userB.Guid)
    {
        return true;
    }

    if (UsersHaveUpn(userA, userB))
    {
        if (userB.UserPrincipalName.Equals(userA.UserPrincipalName, StringComparison.InvariantCultureIgnoreCase))
        {
            return true;
        }
    }
    return false;
}

private bool UsersHaveUpn(ISyncableUser userA, ISyncableUser userB)
{
    return !string.IsNullOrWhiteSpace(userA.UserPrincipalName)
            && !string.IsNullOrWhiteSpace(userB.UserPrincipalName);
}

The problem I'm having, is with implementing GetHashCode so that the above conditional equality, represented above, is respected. The only way I've been able to get the intersect and except calls to work as expected is to simple always return the same value from GetHashCode(), forcing a call to Equals.

 public int GetHashCode(ISyncableUser obj)
 {
     return 0;
 }

This works but the performance penalty is huge, as expected. (I've tested this with non-conditional equality. With two sets containing 50000 objects, a proper hashcode implementation allows execution of intercept and except in about 40ms. A hashcode implementation that always returns 0 takes approximately 144000ms (yes, 2.4 minutes!))

So, how would I go about implementing a GetHashCode() in the scenario above?

Any thoughts would be more than welcome!

If we suppose that your Equals implementation is correct, ie it's reflective, transitive and symmetric then the basic implementation for your GetHashCode function should look like this:


        public int GetHashCode(ISyncableUser obj)
        {
            if (obj == null)
            {
                return SOME_CONSTANT;
            }

            if (!string.IsNullOrWhiteSpace(obj.UserPrincipalName) &&
                <can have user object with different guid and the same name>)
            {
                return GetHashCode(obj.UserPrincipalName);
            }

            return GetHashCode(obj.Guid);
        }

You should also understand that you've got rather intricate dependencies between your objects.

Indeed, let's take two ISyncableUser objects: 'u1' and 'u2', such that u1.Guid != u2.Guid, but u1.UserPrincipalName == u2.UserPrincipalName and names are not empty. Requirements for Equality imposes that for any 'ISyncableUser' object 'u' such that u.Guid == u1.Guid, the condition u.UserPrincipalName == u1.UserPrincipalName should be also true. This reasoning dictates GetHashCode implementation, for each user object it should be based either on it's name or guid.

If I'm reading this correctly, your equality relation is not transitive. Picture the following three ISyncableUser s:

A { Guid: "1", UserPrincipalName: "2" }
B { Guid: "2", UserPrincipalName: "2" }
C { Guid: "2", UserPrincipalName: "1" }
  • A == B because they have the same UserPrincipalName
  • B == C because they have the same Guid
  • A != C because they don't share either.

From the spec ,

The Equals method is reflexive, symmetric, and transitive. That is, it returns true if used to compare an object with itself; true for two objects x and y if it is true for y and x ; and true for two objects x and z if it is true for x and y and also true for y and z .

If your equality relation isn't consistent, there's no way you can implement a hash code that backs it up.

From another point of view: you're essentially looking for three functions:

  • G mapping GUIDs to ints (if you know the GUID but the UPN is blank)
  • U mapping UPNs to ints (if you know the UPN but the GUID is blank)
  • P mapping (guid, upn) pairs to ints (if you know both)

such that G(g) == U(u) == P(g, u) for all g and u . This is only possible if you ignore g and u completely.

One way would be to maintain a dictionary of hashcodes for usernames and GUIDS.

  • You could generate this dictionary at the start once for all users, which would probably the cleanest solution.

  • You could add or update an entry in the Constructor of each user.

  • Or, you could maintain that dictionary inside the GetHashCode function. This means your GetHashCode function has more work to do and is not free of side-effects. Getting this to work with multiple threads or parallel-linq will need some more carefull work. So I don't know whether I would recommend this approach.

Nevertheless, here is my attempt:

private Dictionary<string, int> _guidHash = 
     new Dictionary<string, int>();

private Dictionary<string, int> _nameHash = 
     new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);

public int GetHashCode(ISyncableUser obj)
{
    int hash = 0;

    if (obj==null) return hash;

    if (!String.IsNullOrWhiteSpace(obj.Guid) 
        && _guidHash.TryGetValue(obj.Guid, out hash))
        return hash;

    if (!String.IsNullOrWhiteSpace(obj.UserPrincipalName) 
        && _nameHash.TryGetValue(obj.UserPrincipalName, out hash))
        return hash;

    hash = RuntimeHelpers.GetHashCode(obj); 
    // or use some other method to generate an unique hashcode here

    if (!String.IsNullOrWhiteSpace(obj.Guid)) 
         _guidHash.Add(obj.Guid, hash);

    if (!String.IsNullOrWhiteSpace(obj.UserPrincipalName)) 
         _nameHash.Add(obj.UserPrincipalName, hash);

    return hash;
}

Note that this will fail if the ISyncableUser objects do not play nice and exhibit cases like in Rawling's answer. I am assuming that users with the same GUID will have the same name or no name at all, and users with the same principalName have the same GUID or no GUID at all. (I think the given Equals implementation has the same limitations)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM