简体   繁体   English

从 IList 中删除多个项目的最有效方法<t></t>

[英]Most efficient way to remove multiple items from a IList<T>

What is the most efficient way to remove multiple items from an IList<T> object. Suppose I have an IEnumerable<T> of all the items I want to remove, in the same order of occurrence that in the original list.IList<T> object 中删除多个项目的最有效方法是什么。假设我有一个包含所有要删除的项目的IEnumerable<T> ,其出现顺序与原始列表中的出现顺序相同。

The only way I have in mind is:我想到的唯一方法是:

IList<T> items;
IEnumerable<T> itemsToDelete;
...

foreach (var x in itemsToDelete)
{
    items.Remove(x);
}

But I guess it's not efficient, because it has to go over the list from the beggining every time the method Remove is called.但我猜它效率不高,因为每次调用方法Remove时,它都必须从开始的列表开始 go。

As the number of items to remove gets larger, you will probably find traversing the list and checking each item against a hashset of "items to remove" is more efficient. 随着要删除的项目数量变大,您可能会发现遍历列表并根据“要删除的项目”的哈希集检查每个项目更有效。 An extension method like this might help: 像这样的扩展方法可能会有所帮助:

static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove)
{
    var set = new HashSet<T>(itemsToRemove);

    var list = iList as List<T>;
    if (list == null)
    {
        int i = 0;
        while (i < iList.Count)
        {
            if (set.Contains(iList[i])) iList.RemoveAt(i);
            else i++;
        }
    }
    else
    {
        list.RemoveAll(set.Contains);
    }
}

I benchmarked using this little program below. 我使用下面这个小程序进行基准测试。 (Note that it uses an optimized path if IList<T> is actually a List<T> .) (注意,如果IList<T>实际上是List<T> ,它使用优化路径。)

On my machine (and using my test data), this extention method took 1.5 seconds to execute vs 17 seconds for the code in your question. 在我的机器上(并使用我的测试数据),这个扩展方法执行需要1.5秒 ,而问题中的代码需要17秒 However, I have not tested with different sizes of data. 但是,我还没有测试过不同大小的数据。 I'm sure for removing just a couple of items RemoveAll2 will be faster. 我肯定只删除几个项目RemoveAll2会更快。

static class Program
{
    static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove)
    {
        var set = new HashSet<T>(itemsToRemove);

        var list = iList as List<T>;
        if (list == null)
        {
            int i = 0;
            while (i < iList.Count)
            {
                if (set.Contains(iList[i])) iList.RemoveAt(i);
                else i++;
            }
        }
        else
        {
            list.RemoveAll(set.Contains);
        }
    }

    static void RemoveAll2<T>(this IList<T> list, IEnumerable<T> itemsToRemove)
    {
        foreach (var item in itemsToRemove)
            list.Remove(item);
    }

    static void Main(string[] args)
    {
        var list = Enumerable.Range(0, 10000).ToList();
        var toRemove = new[] { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 
                              43,  47,  53,  59,  61,  67,  71,  73,  79,  83,  89,  97, 101,
                             103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167,
                             173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239,
                             241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313,
                             317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397,
                             401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467,
                             479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569,
                             571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643,
                             647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733,
                             739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823,
                             827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911,
                             919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997};
        list.RemoveAll(toRemove); // JIT 
        //list.RemoveAll2(toRemove); // JIT 

        var sw = Stopwatch.StartNew();
        for (int i = 0; i < 10000; i++)
        {
            list.RemoveAll(toRemove);
            //list.RemoveAll2(toRemove);
        }
        sw.Stop();
        Console.WriteLine("Elapsed: {0} ms", sw.ElapsedMilliseconds);
        Console.ReadKey();
    }
}

UPDATE (for @KarmaEDV's & Mark Sowul's comments below): If you need to use a custom equality comparer, the extension method could have an overload that takes such a comparer: 更新 (对于@ KarmaEDV和Mark Sowul的评论如下):如果你需要使用自定义相等比较器,扩展方法可能会有一个带有这样一个比较器的重载:

public static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove, IEqualityComparer<T> comparer = null)
{
    var set = new HashSet<T>(itemsToRemove, comparer ?? EqualityComparer<T>.Default);

    if (iList is List<T> list)
    {
        list.RemoveAll(set.Contains);
    }
    else
    {
        int i = iList.Count - 1;
        while (i > -1)
        {
            if (set.Contains(iList[i])) iList.RemoveAt(i);
            else i--;
        }
    }
}

If the IList<T> reference happens to refer to an instance of List<T> , casting to that type and using RemoveAll is apt to yield better performance than any other approach that doesn't rely upon the particulars of its implementation. 如果IList<T>引用恰好引用List<T>的实例,则转换为该类型并使用RemoveAll比不依赖于其实现细节的任何其他方法更容易产生更好的性能。

Otherwise, while the optimal approach will depend upon the relative fraction of items that are going to be removed and the nature of the IList<T> , I would suggest that your best bet might be to copy the IList<T> to a new List<T> , clear it, and selectively re-add items. 否则,虽然最佳方法将取决于将要删除的项目的相对比例和IList<T>的性质,但我建议您最好的选择是将IList<T>复制到新List<T> ,清除它,并有选择地重新添加项目。 Even if the items in the list are not conducive to efficient hashing, the fact that the items in the IEnumerable<T> are in the same sequence as those in the IList<T> would render that irrelevant. 即使列表中的项目不利于有效散列, IEnumerable<T>中的项目与IList<T>中的项目的顺序相同也会使其无关紧要。 Start by reading an item from the IEnumerable<T> . 首先从IEnumerable<T>读取一个项目。 Then copy items from the array to the list until that one is found. 然后将数组中的项目复制到列表中,直到找到该项目。 Then read the next item from the IEnumerable<T> and copy from the array to the list until that one is found, etc. Once the IEnumerable<T> is exhausted, copy the balance of the array to the List<T> . 然后从IEnumerable<T>读取下一个项目并从数组复制到列表,直到找到那个,等等。一旦IEnumerable<T>耗尽,将数组的余额复制到List<T>

This approach will be fast with many implementations of IList<T> . 对于IList<T>许多实现,这种方法会很快。 It has one major disadvantage, though: the fact that it deletes and re-adds each item might have unwanted side-effects on things like observable lists. 但它有一个主要的缺点:它删除并重新添加每个项目的事实可能会对可观察列表之类的东西产生不必要的副作用。 If a list might be observable, one may have to use a much slower N^2 algorithm to ensure correctness. 如果列表可能是可观察的,则可能必须使用更慢的N ^ 2算法来确保正确性。 [BTW, it irks me that IList<T> has a Remove(T) method but lacks a much more useful RemoveAll(Func<T,bool>) method. [顺便说一句,我觉得IList<T>有一个Remove(T)方法,但缺少一个更有用的RemoveAll(Func<T,bool>)方法。 The Remove(T) is largely redundant with IndexOf and RemoveAt , while RemoveAll would allow O(N) implementations of many operations that are O(N^2) in its absence if one isn't allowed to remove and re-add items. 对于IndexOfRemoveAtRemove(T)在很大程度上是多余的,而如果不允许删除和重新添加项目,则RemoveAll将允许O(N)实现多个O(N ^ 2)的操作。

Maybe this helps. 也许这有帮助。 Other ideas of the same type could be included. 可以包括相同类型的其他想法。

IList<T> items;

IEnumerable<T> itemsToDelete;
...
{
   if(items.Equals(itemsToDelete)) //Equal lists?
     {
      items.Clear(); 
      return true;
     }


   if(  (double) items.Count/itemsToDelete.Count < 1){
      /* It is faster to iterate the small list first. */ 
              foreach (var x in items)
              {
                if(itemsToDelete.Contains(x)){/**/} 

              }
    }
   else{
           foreach (var x in itemsToDelete)
              {
               items.Remove(x);
              }
   }
}

This problem would be easier to solve if there was available an extension method RemoveAll for the IList<T> interface.如果IList<T>接口有可用的扩展方法RemoveAll ,这个问题会更容易解决。 So here is one:所以这是一个:

/// <summary>
/// Removes all the elements that match the conditions defined by the
/// specified predicate.
/// </summary>
public static int RemoveAll<T>(this IList<T> list, Func<T, int, bool> predicate)
{
    ArgumentNullException.ThrowIfNull(list);
    ArgumentNullException.ThrowIfNull(predicate);

    int i = 0, j = 0;
    try
    {
        for (; i < list.Count; i++)
        {
            if (predicate(list[i], i)) continue;
            if (j < i) list[j] = list[i];
            j++;
        }
    }
    finally
    {
        if (j < i)
        {
            for (; i < list.Count; i++, j++)
                list[j] = list[i];
            while (list.Count > j)
                list.RemoveAt(list.Count - 1);
        }
    }
    return i - j;
}

This is a modified version of a custom List<T>.RemoveAll implementation that is found in this answer .这是自定义List<T>.RemoveAll实现的修改版本,可在此答案中找到。 Because of the absence of the RemoveRange method in the IList<T> interface, the rightmost leftover slots in the IList<T> are cleared with repeated removals of the last element.由于IList<T>接口中缺少RemoveRange方法,因此 IList<T IList<T>中最右边的剩余槽会随着最后一个元素的重复移除而被清除。 This should be a pretty fast operation in most IList<T> implementations.在大多数IList<T>实现中,这应该是一个相当快的操作。

Now the original problem of removing multiple items from a IList<T> can be solved efficiently like this:现在可以像这样有效地解决从IList<T>中删除多个项目的原始问题:

IList<T> items;
IEnumerable<T> itemsToDelete;
//...

HashSet<T> itemsToDeleteSet = new(itemsToDelete);
items.RemoveAll((x, _) => itemsToDeleteSet.Contains(x));

Online demo .在线演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 查询数据库然后从返回的项目中删除条目的最有效方法 - Most efficient way to query a database and then remove entries from returned items 检查多个项目是否存在的最有效方法 - Most efficient way to check for existence of multiple items 如何从 Ilist 中删除项目 - How to remove items from Ilist 从List中删除重复项的最有效方法 - Most efficient way to remove duplicates from a List 从列表中删除项目而不遇到集合修改异常的最有效方法? - Most efficient way to remove items from a list without running into a collection modified exception? 从字典中删除项目的有效方法 - Efficient way to remove items from dictionary 比较/排序两个阵列中的项目的最有效方法是什么? - What is the most efficient way to compare/sort items from two arrays? 使用 C# 4.8:从字符串数组或字符串列表中删除与字符串模式匹配的项目的最有效方法 - Using C# 4.8: Most efficient way to remove items matching string patterns from a string array or string list 从整数和小数字符串中删除字符的最有效方法 - most efficient way to remove characters from a string of integers and decimals 从字符串中删除特殊字符的最有效方法 - Most efficient way to remove special characters from string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM