简体   繁体   English

如何创建一个哈希集<List<Int> &gt; 有不同的元素?

[英]How to create a HashSet<List<Int>> with distinct elements?

I have a HashSet that contains multiple lists of integers - ie HashSet<List<int>>我有一个包含多个整数列表的 HashSet - 即HashSet<List<int>>

In order to maintain uniqueness I am currently having to do two things: 1. Manually loop though existing lists, looking for duplicates using SequenceEquals .为了保持唯一性,我目前必须做两件事: 1. 手动循环现有列表,使用SequenceEquals查找重复项。 2. Sorting the individual lists so that SequenceEquals works currently. 2. 对各个列表进行排序,以便SequenceEquals当前可以工作。

Is there a better way to do this?有一个更好的方法吗? Is there an existing IEqualityComparer that I can provide to the HashSet so that HashSet.Add() can automatically handle uniqueness?是否有我可以提供给 HashSet 的现有 IEqualityComparer 以便HashSet.Add()可以自动处理唯一性?

var hashSet = new HashSet<List<int>>();

for(/* some condition */)
{
    List<int> list = new List<int>();

    ...

    /* for eliminating duplicate lists */

    list.Sort();

    foreach(var set in hashSet)
    {
        if (list.SequenceEqual(set))
        {
            validPartition = false;
            break;
        }
    }

    if (validPartition)
           newHashSet.Add(list);
}

This starts off wrong, it has to be a HashSet<ReadOnlyCollection<>> because you cannot allow the lists to change and invalidate the set predicate.这开始是错误的,它必须是HashSet<ReadOnlyCollection<>>因为您不能允许列表更改并使集合谓词无效。 This then allows you to calculate a hash code in O(n) when you add the collection to the set.然后,当您将集合添加到集合时,这允许您在 O(n) 中计算哈希码。 And an O(n) test to check if it is already in the set with a very uncommon O(n^2) worst case if all the hashes turn out to be equal.如果所有散列结果都相等,则进行 O(n) 测试以检查它是否已经在一个非常罕见的 O(n^2) 最坏情况的集合中。 Store the computed hash with the collection.将计算出的哈希与集合一起存储。

Here is a possible comparer that compares an IEnumerable<T> by its elements.这是一个可能的比较器,它通过其元素比较IEnumerable<T> You still need to sort manually before adding.您仍然需要在添加之前手动排序。

One could build the sorting into the comparer, but I don't think that's a wise choice.可以将排序构建到比较器中,但我认为这不是一个明智的选择。 Adding a canonical form of the list seems wiser.添加列表的规范形式似乎更明智。

This code will only work in .net 4 since it takes advantage of generic variance.此代码仅适用于 .net 4,因为它利用了通用方差。 If you need earlier versions you need to either replace IEnumerable with List , or add a second generic parameter for the collection type.如果您需要早期版本,则需要将IEnumerable替换为List ,或者为集合类型添加第二个通用参数。

class SequenceComparer<T>:IEqualityComparer<IEnumerable<T>>
{
    public bool Equals(IEnumerable<T> seq1,IEnumerable<T> seq2)
    {
        return seq1.SequenceEqual(seq2);
    }
    
    public int GetHashCode(IEnumerable<T> seq)
    {
        int hash = 1234567;
        foreach(T elem in seq)
            hash = unchecked(hash * 37 + elem.GetHashCode());
        return hash;
    }
}

void Main()
{
    var hashSet = new HashSet<List<int>>(new SequenceComparer<int>());

    List<int> test=new int[]{1,3,2}.ToList();
    test.Sort();
    hashSet.Add(test);

    List<int> test2=new int[]{3,2,1}.ToList();
    test2.Sort();       
    hashSet.Contains(test2).Dump();
}

Is there a reason you aren't just using an array?您是否有理由不只是使用数组? int[] will perform better. int[]会表现得更好。 Also I assume the lists contain duplicates, otherwise you'd just be using sets and not have a problem.另外我假设列表包含重复项,否则您只会使用集合而没有问题。

It appears that their contents won't change (much) once they've been added to the HashSet .一旦它们被添加到HashSet中,它们的内容似乎不会改变(很多)。 At the end of the day, you are going to have to use a comparer that falls back on SequenceEqual .归根结底,您将不得不使用依赖于SequenceEqual的比较器。 But you don't have to do it every single time.但是您不必每次都这样做。 Instead or doing an exponential number of sequence compares (eg -- as the hashset grows, doing a SequenceEqual against each existing member) -- if you create a good hashcode up front, you may have to do very few such compares.相反,或者进行指数级的序列比较(例如——随着哈希集的增长,对每个现有成员执行SequenceEqual )——如果你预先创建了一个好的哈希码,你可能需要做很少的这样的比较。 While the overhead of generating a good hashcode is probably about the same as doing a SequenceEqual you're only doing it a single time for each list.虽然生成良好哈希码的开销可能与执行SequenceEqual大致相同,但您只需为每个列表执行一次。

So, the first time you operate on a particular List<int> , you should generate a hash based on the ordered sequence of numbers and cache it.因此,当您第一次对特定List<int>进行操作时,您应该根据有序的数字序列生成一个哈希并将其缓存。 Then the next time the list is compared, the cached value can be used.然后下次比较列表时,就可以使用缓存的值了。 I'm not sure how you might do this with a comparer off the top of my head (maybe a static dictionary?) -- but you could implement List wrapper that does this easily.我不确定您如何使用我头顶上的比较器(可能是静态字典?)来做到这一点——但您可以实现轻松执行此操作的List包装器。

Here's a basic idea.这是一个基本的想法。 You'd need to be careful to ensure that it isn't brittle (eg make sure you void any cached hash code when members change) but it doesn't look like that's going to be a typical situation for the way you're using this.您需要小心确保它不脆弱(例如,确保在成员更改时使任何缓存的哈希码无效),但对于您使用的方式而言,这看起来不会是典型情况这。

public class FasterComparingList<T>: IList<T>, IList, ... 
    /// whatever you need to implement
{
   // Implement your interfaces against InnerList
   // Any methods that change members of the list need to
   // set _LongHash=null to force it to be regenerated
   public List<T> InnerList { ... lazy load a List }
   public int GetHashCode()
   {
       if (_LongHash==null) {
           _LongHash=GetLongHash();
       }
       return (int)_LongHash;
   }
   private int? _LongHash=null;
   public bool Equals(FasterComparingList<T> list)
   {
       if (InnerList.Count==list.Count) {
           return true;
       }
       // you could also cache the sorted state and skip this if a list hasn't
       // changed since the last sort
       // not sure if native `List` does
       list.Sort();
       InnerList.Sort();
       return InnerList.SequenceEqual(list);
   }
   protected int GetLongHash()
   {
       return .....
       // something to create a reasonably good hash code -- which depends on the 
       // data. Adding all the numbers is probably fine, even if it fails a couple 
       // percent of the time you're still orders of magnitude ahead of sequence
       // compare each time
   } 
}

If the lists won't change once added, this should be very fast.如果列表一旦添加就不会改变,这应该非常快。 Even in situations where the lists could change frequently, the time to create a new hash code is not likely very different (if even greater at all) than doing a sequence compare.即使在列表可能经常更改的情况下,创建新哈希码的时间也可能与进行序列比较的时间差别不大(如果甚至更大)。

If you don't specify an IEQualityComparer, then the types default will be used, so I think what you'll need to do is create your own implementation of IEQualityComparer, and pass that to the constructor of your HashSet.如果您不指定 IEQualityComparer,则将使用默认类型,因此我认为您需要创建自己的 IEQualityComparer 实现,并将其传递给 HashSet 的构造函数。 Here is a good example . 这是一个很好的例子

When comparing hashsets of lists one option you always have is that instead of comparing each element, you sort lists and join them using a comma and compare generated strings.在比较列表的哈希集时,您始终拥有的一个选项是,不是比较每个元素,而是对列表进行排序并使用逗号连接它们并比较生成的字符串。

So, in this case, when you create custom comparer instead of iterating over elements and calculating custom hash function, you can apply this logic.因此,在这种情况下,当您创建自定义比较器而不是迭代元素并计算自定义哈希函数时,您可以应用此逻辑。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM