简体   繁体   English

如何在谓词中使用索引参数实现 List.RemoveAll 方法的专门重载?

[英]How to implement a specialized overload of the List.RemoveAll method, with an index parameter in the predicate?

The List<T>.RemoveAll is a quite useful method, that allows to remove efficiently multiple items from a list. List<T>.RemoveAll是一个非常有用的方法,它允许从列表中高效地删除多个项目。 Unfortunately in some scenarios I needed some extra features that the method doesn't have, and some guarantees that the documentation doesn't provide.不幸的是,在某些情况下,我需要一些该方法没有的额外功能,并且文档没有提供一些保证。 It also has a questionable behavior in case the match predicate fails, that causes me anxiety.如果match谓词失败,它还有一个可疑的行为,这让我很焦虑。 So in this question I am asking for an implementation of the same method, in the form of an extension method, with these features and characteristics:所以在这个问题中,我要求以扩展方法的形式实现相同的方法,具有以下特性和特性:

  1. Instead of aPredicate<T> it accepts a Func<T, int, bool> delegate, where the int is the zero-based index of the T item.它接受Func<T, int, bool>委托,而不是Predicate<T> ,其中intT项的从零开始的索引。
  2. It guarantees that the predicate will be invoked exactly once for each item, in a stricly ascending order.它保证谓词将以严格的升序为每个项目调用一次。
  3. In case the predicate returns true for some items and then fails for another item, the items that have been elected for removal are removed from the list before the propagation of the exception.如果谓词对某些项目返回true ,然后对另一个项目返回失败,则已选择删除的项目会在异常传播之前从列表中删除。

Here is the signature of the extension method that I am trying to implement:这是我要实现的扩展方法的签名:

public static int RemoveAll<T>(this List<T> list, Func<T, int, bool> predicate);

It returns the number of elements that were removed.它返回已删除的元素数。

I attempted to implement it using as starting point the existing implementation , but it has some performance optimizations that make it quite complex, and injecting the desirable "exceptional" behavior is not obvious.我尝试使用现有实现作为起点来实现它,但它有一些性能优化使其变得非常复杂,并且注入所需的“异常”行为并不明显。 I am interested for an implementation that is simple and reasonably efficient.我对简单且合理高效的实现很感兴趣。 Using LINQ in the implementation is not desirable, because it implies memory allocations that I would like to avoid.在实现中使用 LINQ 是不可取的,因为它意味着我想避免的 memory 分配。


Context: I should demonstrate the behavior of the built-in List<T>.RemoveAll method, and explain why I don't like it.上下文:我应该演示内置List<T>.RemoveAll方法的行为,并解释为什么我不喜欢它。 In case the match predicate fails for an item in the middle of the list, the items that have already been elected for removal are either not removed, or they are replaced with duplicates of other elements.如果列表中间的某个项目的match谓词失败,则已经选择要删除的项目要么不删除,要么用其他元素的副本替换。 In all cases the list retains its original size.在所有情况下,列表都保留其原始大小。 Here is a minimal demo:这是一个最小的演示:

List<int> list = new(Enumerable.Range(1, 15));
Console.WriteLine($"Before RemoveAll: [{String.Join(", ", list)}]");
try
{
    list.RemoveAll(item =>
    {
        if (item == 10) throw new Exception();
        bool removeIt = item % 2 == 1;
        if (removeIt) Console.WriteLine($"Removing #{item}");
        return removeIt;
    });
}
catch { } // Ignore the error for demonstration purposes
finally
{
    Console.WriteLine($"After RemoveAll: [{String.Join(", ", list)}]");
}

The list has 15 numbers, and the intention is to remove the odd numbers from the list.该列表有 15 个数字,目的是从列表中删除奇数。 The predicate fails for the 10th number.谓词对第 10 个数字失败。

Output: Output:

Before RemoveAll: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
Removing #1
Removing #3
Removing #5
Removing #7
Removing #9
After RemoveAll: [2, 4, 6, 8, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

Online demo .在线演示

As you can see the numbers 1 and 3 have been removed, the 5, 7 and 9 are still there, and the numbers 6 and 8 have been duplicated (there are two occurrences of each).如您所见,数字 1 和 3 已被删除,5、7 和 9 仍然存在,数字 6 和 8 已重复(每个出现两次)。 On the contrary the output that I expected to see is:相反,我希望看到的 output 是:

After RemoveAll: [2, 4, 6, 8, 10, 11, 12, 13, 14, 15]

This would be a reasonable and predictable behavior I could count on.这是我可以信赖的合理且可预测的行为。 It keeps the levels of danger in a manageable level.它将危险级别保持在可管理的水平。 I am not risking, for example, duplicating items in a virtual shopping cart, or printing twice some PDF documents from a selection.例如,我不会冒险在虚拟购物车中复制商品,或从选择中打印两次 PDF 文档。 The existing behavior stretches a bit too much my comfort levels.现有的行为有点超出了我的舒适度。

I have reported this behavior to Microsoft, and the feedback that I've got is that in case of failure the outcome is undefined.我已将此行为报告给 Microsoft,我得到的反馈是,如果失败,结果是不确定的。 From their point of view there is no difference between the two above outputs (the actual and the expected).从他们的角度来看,上述两个输出(实际和预期)之间没有区别。 Both are equally corrupted, because both represent a state that is neither the original nor the final/correct state after a successful execution.两者都同样损坏,因为它们都代表一个 state,它既不是原始的也不是成功执行后的最终/正确的 state。 So they don't think that there is any bug that needs to be fixed, and doing changes that could potentially affect negatively the performance of successful executions is not justified.所以他们认为没有任何错误需要修复,并且进行可能对成功执行的性能产生潜在负面影响的更改是不合理的。 They also believe that the existing behavior is not surprising or unexpected, so there is no reason to document it.他们还认为现有行为并不令人惊讶或出乎意料,因此没有理由将其记录下来。

This solution is based on the idea to separate the selection of the items to be removed from the removal itself.该解决方案基于将要删除的项目的选择与删除本身分开的想法。

This has the following advantages :这具有以下优点

  • If during the selection process, an exception occurs, the list will be left untouched如果在选择过程中发生异常,列表将保持不变
  • The removal process can only fail in catastrophic cases (OutOfMemoryException etc.)删除过程只会在灾难性情况下失败(OutOfMemoryException 等)

But of course also some disadantages :但当然也有一些缺点

  • it requires extra memory to store the intermediate selection result需要额外的memory来存储中间选择结果
  • some optimizations might not be as effective一些优化可能没有那么有效

Because of the mentioned optimizations, I chose to base the selection result on ranges instead of individual indexes, so we can use List.RemoveRange which if more effective than individual RemoveAt calls (assumed that there are in fact ranges with more than one element).由于提到的优化,我选择将选择结果基于范围而不是单个索引,因此我们可以使用 List.RemoveRange,如果它比单个 RemoveAt 调用更有效(假设实际上有多个元素的范围)。

public static List<(int start, int count)> GetIndexRanges<T>(this List<T> list, 
    Func<T, int, bool> predicate)
{
    var result = new List<(int start, int count)>();
    int start = -1;
    for (var i = 0; i < list.Count; i++)
    {
        // see note 1 below
        bool toBeRemoved = predicate(list[i], i);
        if (toBeRemoved)
        {
            if (start < 0)
                start = i; // new range starts
        }
        else if (start >= 0)
        {
            // range finished
            result.Add((start, i - start));
            start = -1;
        }
    }
    if (start >= 0)
    {
        // orphan range at the end
        result.Add((start, list.Count - start));
    }
    return result;
}

public static int RemoveIndexRanges<T>(this List<T> list, 
    List<(int start, int count)> ranges)
{
    var removed = 0;
    foreach (var range in ranges)
    {
        // the "- removed" is there to take into account 
        // that deletion moves the indexes.
        list.RemoveRange(range.start - removed, range.count);
        removed += range.count;
    }
    return removed;
}

Usage :用法

var ranges = list.GetIndexRanges((item, index) =>
    {
        //if (item == 10) throw new Exception();
        return item % 2 == 1;
    });
// See note 2 below
list.RemoveIndexRanges(ranges);

Note 1 : As is, an exception in the predicate would just be propagated during the selection process, with no change to the ecollection.注 1 :按原样,谓词中的异常只会在选择过程中传播,不会对 ecollection 进行任何更改。 To give the caller more control over this, the following could be done: extend GetIndexRanges to still return everything collected so far, and in addition also return any exception as out parameter:为了让调用者对此有更多控制,可以执行以下操作:扩展 GetIndexRanges 以仍然返回到目前为止收集的所有内容,此外还返回任何异常作为out参数:

public static List<(int start, int count)> GetIndexRanges<T>(this List<T> list, 
    Func<T, int, bool> predicate, out Exception exception)
{
    var result = new List<(int start, int count)>();
    int start = -1;
    for (var i = 0; i < list.Count; i++)
    {
        bool toBeRemoved = false;
        try 
        { 
            toBeRemoved = predicate(list[i], i); 
        }
        catch (Exception e) 
        { 
            exception = e;
            break; // omit this line to continue with the selection process
        }
        if (toBeRemoved)
        {
            if (start < 0)
                start = i; // new range starts
        }
        else if (start >= 0)
        {
            // range finished
            result.Add((start, i - start));
            start = -1;
        }
    }
    if (start >= 0)
    {
        // orphan range at the end
        result.Add((start, list.Count - start));
    }
    return result;
}

var ranges = list.GetIndexRanges((item, index) =>
    {
        if (item == 10) throw new Exception();
        return item % 2 == 1;
    }, out var exception);

// to fulfil requirement #3, we remove the ranges collected so far
// even in case of an exception
list.RemoveIndexRanges(ranges);

// and then throw the exception afterwards
if (exception != null) 
    ExceptionDispatchInfo.Capture(exception).Throw();

Note 2 : As this is now a two-step process, it will fail if the list changes between the calls.注意 2 :由于这是一个两步过程,如果列表在两次调用之间发生变化,它将失败。

So they don't think that there is any bug that needs to be fixed.所以他们认为没有任何错误需要修复。 They also believe that this behavior is not surprising or unexpected, so there is no need to document it.他们还认为这种行为并不奇怪或出乎意料,因此没有必要记录下来。

They're correct.他们是对的。 The method is documented as:该方法记录为:

Removes all the elements that match the conditions defined by the specified predicate.删除与指定谓词定义的条件匹配的所有元素。

This supports two scenarios: the predicate returning true , removing an element, or false for leaving it as-is.这支持两种情况:谓词返回true ,删除元素,或false保持原样。 A predicate throwing an exception is not a use case intended to be supported.抛出异常的谓词不是旨在支持的用例。

If you want to be able to pass a predicate that may throw, you could wrap it like this:如果你想传递一个可能抛出的谓词,你可以像这样包装它:

public static int RemoveAll<T>(this List<T> list, Func<T, int, bool> predicate)
{
    Exception? caught = null;
    int index = 0;
    int removed = 0;

    list.RemoveAll(item =>
    {
        // Ignore the rest of the list once thrown
        if (caught != null) return false;

        try
        {
            var remove = predicate(item, index);
            if (remove)
            {
                removed++;
            }

            return remove;
        }
        catch (Exception e)
        {
            caught = e;
            return false;
        }

        index++;
    });

    if (caught != null)
    {
        throw caught;
    }

    return removed;
}

I think that I've managed to come up with an implementation that satisfies all three requirements:我认为我已经设法提出了一个满足所有三个要求的实现:

/// <summary>
/// Removes all the elements that match the conditions defined by the specified
/// predicate. In case the predicate fails, the integrity of the list is preserved.
/// </summary>
public static int RemoveAll<T>(this List<T> list, Func<T, int, bool> predicate)
{
    ArgumentNullException.ThrowIfNull(list);
    ArgumentNullException.ThrowIfNull(predicate);

    Span<T> span = CollectionsMarshal.AsSpan(list);
    int i = 0, j = 0;
    try
    {
        for (; i < span.Length; i++)
        {
            if (predicate(span[i], i)) continue;
            if (j < i) span[j] = span[i];
            j++;
        }
    }
    finally
    {
        if (j < i)
        {
            for (; i < span.Length; i++, j++)
                span[j] = span[i];
            list.RemoveRange(j, span.Length - j);
        }
    }
    return i - j;
}

For better performance it uses the CollectionsMarshal.AsSpan method (.NET 5) to get a Span<T> out of the list.为了获得更好的性能,它使用CollectionsMarshal.AsSpan方法 (.NET 5) 从列表中获取Span<T> The algorithm works just as well by using the indexer of the list instead of the span, and replacing the span.Length with list.Count .通过使用列表的索引器而不是跨度,并将span.Length ,该算法list.Count

Online demo .在线演示

I haven't benchmark this implementation, but I expect it to be only marginally slower than the native implementation.我没有对这个实现进行基准测试,但我预计它只会比本机实现稍微慢一点。

I don't know microsoft is how to wrote this method.不知道微软是怎么写这个方法的。

I tried some code block.我尝试了一些代码块。 And i found case.我找到了案例。

Actually problem is your throw new Exception() .实际上问题是你的throw new Exception() If you dont this code that time yo code will run perfect.如果你当时没有这段代码,你的代码将运行完美。 Exception trigger some another case.异常触发另一种情况。 But i dont know what is that.但我不知道那是什么。

if (item >= 10) return false;
bool removeIt = item % 2 == 1;
if (removeIt) Console.WriteLine($"Removing #{item}");
return removeIt;

I found this.我找到了这个。 EDIT编辑

Actually Func<T, int, bool> property is not deleted some item.实际上Func<T, int, bool>属性并没有删除某些项目。 It return boolean. As if return true he succesful deleted from list.它返回 boolean。就像返回 true 一样,他成功地从列表中删除了。 If return false.如果返回假。 it is not deleted from list.它不会从列表中删除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM