简体   繁体   中英

Removing duplicates from a List or HashSet in c#

I have a very simple test method that returns a List that has a number of duplicates, but when it did not I thought I'd try HashSet as that should remove duplicates, but it appears I need to override the Equals and GetHashCode but I am really struggling to understand what I need to do. I would appreciate some pointers please.

HashSet<object> test = XmlManager.PeriodHashSet(Server.MapPath("../Xml/XmlFile.xml"));
foreach (Object period in test2)
{
    PeriodData pd = period as PeriodData;
    Response.Write(pd.PeriodName + "<br>");
}

I also tried it with the following

List<object> test = XmlManager.PeriodList(Server.MapPath("../Xml/XmlFile.xml"));
List<object> test2 = test.Distinct().ToList();
foreach (Object period in test2)
{
    PeriodData pd = period as PeriodData;
    Response.Write(pd.PeriodName + "<br>");
}

The PeriodData objuect is delcarewd as follows:

public class PeriodData
{
    private int m_StartYear = -9999999;
    private int m_EndYear = -9999999;
    private string m_PeriodName = String.Empty;

    public int StartYear
    {
        get { return m_StartYear; }
        set { m_StartYear = value; }
    }
    public int EndYear
    {
        get { return m_EndYear; }
        set { m_EndYear = value; }
    }
    public string PeriodName
    {
        get { return m_PeriodName; }
        set { m_PeriodName = value; }
    }
}

It is the returned PeriodName I want to remove the duplicate for.

For the HashSet<T> to work, you need to, at a minimum, override Object.Equals and Object.GetHashCode . This is what allows the hashing algorithm to know what makes two objects "distinct" or the same by values.

In terms of simplifying and improving the code, there are two major changes I'd recommend to make this work:

First, you should use HashSet<PeriodData> (or List<PeriodData> ), not HashSet<object> .

Second, your PeriodData class should implement IEquatable<PeriodData> in order to provide proper hashing and equality.

You have to decide what makes two periods equal. If all three properties have to be the same for two periods to be equal, then you can implement Equals thus:

public override bool Equals(object obj)
{
    if (ReferenceEquals(null, obj)) return false;
    if (ReferenceEquals(this, obj)) return true;
    if (obj.GetType() != this.GetType()) return false;
    PeriodData other = (PeriodData)obj;
    return m_StartYear == other.m_StartYear && m_EndYear == other.m_EndYear && string.Equals(m_PeriodName, other.m_PeriodName);
}

For GetHashCode, you could do something like this:

    public override int GetHashCode()
    {
        return (((m_StartYear * 397) ^ m_EndYear) * 397) ^ m_PeriodName.GetHashCode();
    }

(Credit where it is due: these are adapted from the code generated by ReSharper's code generation tool.)

As others have noted, it would be better to implement IEquatable<T> as well.

If you cannot modify the class, or you do not want to modify it, you can put the equality comparison logic in another class that implements IEqualityComparer<PeriodData , which you can pass to the appropriate constructor of HashSet<PeriodData> and Enumerable.Distinct()

You have to implement IEquatable<T> to make Distinct() work.

How would the framework know how to say " those two objects are identical " if you don't? You have to provide the framework a way to compare your objects, that's the purpose of the IEquatable<T> implementation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM