简体   繁体   English

删除没有Linq的IList的IList的重复项

[英]Removing duplicates without Linq for IList of IList

What is the most efficient way to remove duplicates in an IList in C# without Linq 没有Linq C#中删除IList中重复项的最有效方法是什么

I have the following code from another SO [1] , 我有另一个SO [1]的以下代码,

IList<IList<int>> output = new List<IList<int>>(); 
var lists = output;
for (int i = 0; i < lists.Count; ++i)
{
  //since we want to compare sequecnes, we shall ensure the same order of the items
   var item = lists[i].OrderBy(x => x).ToArray();
   for (int j = lists.Count - 1; j > i; --j)
        if (item.SequenceEqual(lists[j].OrderBy(x => x)))
          lists.RemoveAt(j);
 }

I am using this in a bigger coding challenge and without Linq or syntactic sugars, I am trying to see if there is any elegant/fast solution ? 我在更大的编码挑战中使用它,并且没有Linq或语法糖,我试图查看是否有任何优雅/快速的解决方案?

I am thinking just using a Hash but I am not sure what kind of Hashing function to use to identify that the List is already available? 我在考虑仅使用哈希,但不确定使用哪种哈希函数来标识列表已可用?

More clearly For an input like 更清晰地输入像

{{1,2,4, 4}, {3,4,5}, {4,2,1,4} }

Intermediate Output is [Sorted input/output is fine] 中间输出为[排序输入/输出正常]

{{1,2,4,4}, {3,4,5}, {1,2,4,4} }

Output: 输出:

{{1,2,4,4}, {3,4,5}}

I have used a modified version of the internals of CollectionAssert.AreEquivalent from Microsoft: 我使用了Microsoft的CollectionAssert.AreEquivalent内部的修改版本:

using System.Collections.Generic;

public class Program
{
    public static void Main()
    {
        var lists = new List<List<int>>
        {
            new List<int> {1, 4, 2},
            new List<int> {3, 4, 5},
            new List<int> {1, 2, 4}
        };

        var dedupe =
            new List<List<int>>(new HashSet<List<int>>(lists, new MultiSetComparer<int>()));
    }

    // Equal if sequence contains the same number of items, in any order
    public class MultiSetComparer<T> : IEqualityComparer<IEnumerable<T>>
    {
        public bool Equals(IEnumerable<T> first, IEnumerable<T> second)
        {
            if (first == null)
                return second == null;

            if (second == null)
                return false;

            if (ReferenceEquals(first, second))
                return true;

            // Shortcut when we can cheaply look at counts
            var firstCollection = first as ICollection<T>;
            var secondCollection = second as ICollection<T>;
            if (firstCollection != null && secondCollection != null)
            {
                if (firstCollection.Count != secondCollection.Count)
                    return false;

                if (firstCollection.Count == 0)
                    return true;
            }

            // Now compare elements
            return !HaveMismatchedElement(first, second);
        }

        private static bool HaveMismatchedElement(IEnumerable<T> first, IEnumerable<T> second)
        {
            int firstNullCount;
            int secondNullCount;

            // Create dictionary of unique elements with their counts
            var firstElementCounts = GetElementCounts(first, out firstNullCount);
            var secondElementCounts = GetElementCounts(second, out secondNullCount);

            if (firstNullCount != secondNullCount || firstElementCounts.Count != secondElementCounts.Count)
                return true;

            // make sure the counts for each element are equal, exiting early as soon as they differ
            foreach (var kvp in firstElementCounts)
            {
                var firstElementCount = kvp.Value;
                int secondElementCount;
                secondElementCounts.TryGetValue(kvp.Key, out secondElementCount);

                if (firstElementCount != secondElementCount)
                    return true;
            }

            return false;
        }

        private static Dictionary<T, int> GetElementCounts(IEnumerable<T> enumerable, out int nullCount)
        {
            var dictionary = new Dictionary<T, int>();
            nullCount = 0;

            foreach (T element in enumerable)
            {
                if (element == null)
                {
                    nullCount++;
                }
                else
                {
                    int num;
                    dictionary.TryGetValue(element, out num);
                    num++;
                    dictionary[element] = num;
                }
            }

            return dictionary;
        }

        public int GetHashCode(IEnumerable<T> enumerable)
        {
            int hash = 17;
            // Create and sort list in-place, rather than OrderBy(x=>x), because linq is forbidden in this question
            var list = new List<T>(enumerable);
            list.Sort();
            foreach (T val in list)
                hash = hash * 23 + (val == null ? 42 : val.GetHashCode());

            return hash;
        }
    }
}

This uses Hashset<T> , adding to this collection automatically ignores duplicates. 这使用Hashset<T> ,添加到此集合Hashset<T>自动忽略重复项。

The last line could read: 最后一行可能显示为:

var dedupe = new HashSet<List<int>>(lists, new MultiSetComparer<int>()).ToList();

Technically that uses the System.Linq namespace, but I don't think this is your concern with Linq . 从技术上讲,它使用System.Linq命名空间,但我认为这与Linq无关。

I will echo what Eric Lippert has said. 我会回应埃里克·利珀特(Eric Lippert)所说的话。 You are asking us to show you the raw workings of Linq and the framework internals, but it is not a closed box. 您是在要求我们向您展示Linq的原始工作原理和框架内部结构,但这不是封闭的框。 Also if you are thinking that looking at the source code of these methods will reveal obvious inefficiencies and opportunities to optimize then I find quite often this not to be easy to spot, you are better off reading the docs and measuring. 另外,如果您认为查看这些方法的源代码会发现明显的效率低下和优化机会,那么我经常发现这不容易发现,最好阅读文档并进行评估。

I think this would be much simpler than the accepted answer, and it doesn't use System.Linq namespace at all. 我认为这比接受的答案要简单得多,并且它根本不使用System.Linq命名空间。

public class Program
{
    public static void Main()
    {
        IList<IList<int>> lists = new List<IList<int>>
        {
            new List<int> {1, 2, 4, 4},
            new List<int> {3, 4, 5},
            new List<int> {4, 2, 1, 4},
            new List<int> {1, 2, 2},
            new List<int> {1, 2},
        };

        // There is no Multiset data structure in C#, but we can represent it as a set of tuples,
        // where each tuple contains an item and the number of its occurrences.

        // The dictionary below would not allow to add the same multisets twice, while keeping track of the original lists.
        var multisets = new Dictionary<HashSet<Tuple<int, int>>, IList<int>>(HashSet<Tuple<int, int>>.CreateSetComparer());
        foreach (var list in lists)
        {
            // Count the number of occurrences of each item in the list.
            var set = new Dictionary<int, int>();
            foreach (var item in list)
            {
                int occurrences;
                set[item] = set.TryGetValue(item, out occurrences) ? occurrences + 1 : 1;
            }

            // Create a set of tuples that we could compare.
            var multiset = new HashSet<Tuple<int, int>>();
            foreach (var kv in set)
            {
                multiset.Add(Tuple.Create(kv.Key, kv.Value));
            }

            if (!multisets.ContainsKey(multiset))
            {
                multisets.Add(multiset, list);
            }
        }

        // Print results.
        foreach (var list in multisets.Values)
        {
            Console.WriteLine(string.Join(", ", list));
        }
    }
}

And the output will be: 输出将是:

1, 2, 4, 4
3, 4, 5
1, 2, 2
1, 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM