简体   繁体   English

如何从一个非常大的列表中按索引有效地删除元素?

[英]How can I efficiently remove elements by index from a very large list?

I have a very large list of integers (about 2 billion elements) and a list with indices (couple thousand elements) at which I need to remove elements from the first list.我有一个非常大的整数列表(大约 20 亿个元素)和一个索引列表(几千个元素),我需要从第一个列表中删除元素。 My current approach is to loop over all indices in the second list, passing each to the RemoveAt() method of the first list:我当前的方法是遍历第二个列表中的所有索引,将每个索引传递给第一个列表的RemoveAt()方法:

indices.Sort();
indices.Reverse();
for (i = 0; i < indices.Count; i++)
{
    largeList.RemoveAt(indices[i]);
}

However, it takes about 2 minutes to finish.但是,大约需要 2 分钟才能完成。 I really need to perform this operation much faster.我真的需要更快地执行此操作。 Is there any way of optimizing this?有什么办法可以优化这个吗?

I have a Intel i9X CPU with 10 cores, so maybe some way of parallel processing?我有一个 10 核的 Intel i9X CPU,所以也许是某种并行处理方式?

Each time RemoveAt() is called it has to move every element that comes after the specified index up by one element.每次调用RemoveAt()时,它都必须将指定索引之后的每个元素向上移动一个元素。 If there are thousands of elements to remove in a very large list, that will result in many, many (needless) moves.如果在一个非常大的列表中有数千个元素要删除,那将导致很多很多(不必要的)移动。

My thinking was that if you can calculate the start index and length of each move, you can process the list in a single pass with no overlapping moves. 我的想法是,如果您可以计算出每个移动的起始索引和长度,则可以一次处理该列表,而不会出现重叠移动。 This is done by comparing adjacent elements in the list of indices to remove.这是通过比较要删除的索引列表中的相邻元素来完成的。 While this does mean building a third List<> of move operations to perform, my hope is having the most efficient, minimal move strategy planned out in advance will pay off in the end (or perhaps there's a way to do that without allocating any objects that presently isn't occurring to me).虽然这确实意味着构建要执行的移动操作的第三个List<> ,但我希望提前计划出最有效、最少的移动策略最终会得到回报(或者也许有一种方法可以在不分配任何对象的情况下做到这一点目前我还没有想到)。

You can see my implementation, RemoveUsingMoveStrategy() , in the benchmark code below.您可以在下面的基准代码中看到我的实现RemoveUsingMoveStrategy() Running the launcher program below in test mode, we can see it — as well as the other answers — produces the correct result when given indices 0 , 1 , 5 , 10 , 11 , 15 , 15 (repeated), 18 , and 19 to remove from a 20 -element List<int> ...test模式下运行下面的启动器程序,我们可以看到它——以及其他答案——在给定索引01510111515 (重复)、 1819时产生正确的结果从20元素List<int> ...

PS> dotnet run --configuration release --framework netcoreapp3.1 -- test
[snip]
RemoveUsingMoveStrategy():
        Elements: { 2, 3, 4, 6, 7, 8, 9, 12, 13, 14, 16, 17 }
           Count: 12
        Sequence: Equal to control list
        Duration: 3.95 milliseconds
        Returned: Input list
[snip]

Benchmark notes基准笔记

  • It was my intention to benchmark against a billion-element List<int> , but RemoveUsingRemoveAt() — inspired by the code in the question — is so inefficient that it was taking too long, so I only went up to 10 million elements.我打算以十亿个元素为基准List<int> ,但是RemoveUsingRemoveAt() - 受问题中的代码启发 - 效率太低以至于花费的时间太长,所以我只增加了 1000 万个元素。

  • I still hope to run a billion-element benchmark that excludes RemoveUsingRemoveAt() .我仍然希望运行一个不包括RemoveUsingRemoveAt()的十亿元素基准测试。 For that reason, I introduced a not-quite-as-naive implementation called RemoveUsingListCopy() to serve as the baseline for comparison across all list sizes.出于这个原因,我引入了一个名为RemoveUsingListCopy()的不太天真的实现,作为比较所有列表大小的基线。 As the name implies, it doesn't modify the input list but creates a new one with the removals applied.顾名思义,它不会修改输入列表,而是创建一个应用了删除的新列表。

  • Since not all implementations require the list of indices to remove to be unique and/or sorted, I figured the fairest way to benchmark would be to include that overhead in the benchmark time where the method requires it.由于并非所有实现都要求要删除的索引列表是唯一的和/或排序的,因此我认为最公平的基准测试方法是在方法需要的基准时间中包括该开销

  • The only other changes I made to code from other answers was to make use of the benchmark's lists ( DataList and RemovalList ) and to change from var to explicit types for reader clarity.我对其他答案中的代码所做的唯一其他更改是使用基准列表( DataListRemovalList )并将var更改为显式类型以使读者清楚。

  • RemovalListLocation indicates from where in DataList indices were removed. RemovalListLocation指示从DataList中的何处删除索引。

    • For Beginning , Middle , and End it is a contiguous RemovalListSize -sized block removed from that location.对于BeginningMiddleEnd ,它是从该位置删除的连续RemovalListSize大小的块。
    • For Random it is RemovalListSize random, valid, unsorted, not-guaranteed-to-be-unique indices generated from a constant seed.对于Random ,它是从常量种子生成的随机、有效、未排序、不保证唯一的索引RemovalListSize

    To keep the results short(er), I opted to only benchmark the Middle — thinking it'd be a good middle -of-the-road compromise — and Random values.为了使结果更短(呃),我选择只对Middle值进行基准测试——认为这是一个很好的折中方案——和Random值。

Benchmark results基准测试结果

  • RemoveUsingRemoveAt() is terrible . RemoveUsingRemoveAt()糟糕 Don't do that.不要那样做。

  • For the smaller list, RemoveUsingListCopy() is consistently the fastest, at the expense of doubling memory usage.对于较小的列表, RemoveUsingListCopy()始终是最快的,但代价是 memory 的使用量加倍。

  • For the larger list, Vernou's answer is as least 5x as fast as the other answers, at the expense of relying on implementation details.对于更大的列表, Vernou 的答案至少是其他答案的 5 倍,但代价是依赖于实现细节。

    This just goes to show that you shouldn't necessarily always favor List<> over an array — unless you need its additional capabilities — because it isn't always superior.这只是表明您不必总是偏爱List<>而不是数组——除非您需要它的附加功能——因为它并不总是优越的。 It insulates the underlying array, preventing you (without reflection) from using more performant access methods like Array.Copy() and unsafe code on it.它隔离底层数组,防止您(没有反射)使用更高效的访问方法,如Array.Copy()unsafe的代码。

  • Theodor Zoulias's answer does well with the smaller list, but I think having to "touch" all 10 million indices — each with a call to HashSet<>.Contains() and increment of an index variable — really hurt it with the larger list. Theodor Zoulias 的回答对较小的列表效果很好,但我认为必须“接触”所有 1000 万个索引——每个索引都调用HashSet<>.Contains()并增加一个索引变量——确实对较大的列表造成了伤害。

  • The remaining implementations (including mine) have roughly the same performance: not the best, but still pretty good.其余的实现(包括我的)具有大致相同的性能:不是最好的,但仍然相当不错。

Benchmark data基准数据

Running the launcher program defined later in this answer in benchmark mode, I get these results from BenchmarkDotNet ...benchmark模式下运行本答案稍后定义的启动器程序,我从BenchmarkDotNet获得这些结果......

PS> dotnet run --configuration release --framework netcoreapp3.1 -- benchmark
[snip]
// * Summary *

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.450 (2004/?/20H1)
Intel Core i7 CPU 860 2.80GHz (Nehalem), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.401
  [Host]    : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT
  MediumRun : .NET Framework 4.8 (4.8.4200.0), X64 RyuJIT

Job=MediumRun  InvocationCount=1  IterationCount=15  
LaunchCount=2  UnrollFactor=1  WarmupCount=10  

|                        Method |       Runtime | DataListSize | RemovalListSize | RemovalListLocation |           Mean |         Error |        StdDev |         Median | Ratio | RatioSD |
|------------------------------ |-------------- |------------- |---------------- |-------------------- |---------------:|--------------:|--------------:|---------------:|------:|--------:|
|           RemoveUsingListCopy |      .NET 4.8 |        10000 |            2000 |              Random |       379.9 μs |      15.93 μs |      23.36 μs |       380.4 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt |      .NET 4.8 |        10000 |            2000 |              Random |     1,463.7 μs |      93.79 μs |     137.47 μs |     1,434.7 μs |  3.87 |    0.41 |
|       RemoveUsingMoveStrategy |      .NET 4.8 |        10000 |            2000 |              Random |       635.9 μs |      19.77 μs |      29.60 μs |       624.0 μs |  1.67 |    0.14 |
| Answer63496225_TheodorZoulias |      .NET 4.8 |        10000 |            2000 |              Random |       372.1 μs |      11.52 μs |      16.15 μs |       373.8 μs |  0.99 |    0.07 |
|       Answer63496768_CoolBots |      .NET 4.8 |        10000 |            2000 |              Random |       594.5 μs |      25.13 μs |      36.03 μs |       593.1 μs |  1.57 |    0.12 |
|        Answer63495657_NetMage |      .NET 4.8 |        10000 |            2000 |              Random |       618.8 μs |      17.53 μs |      23.99 μs |       622.4 μs |  1.65 |    0.13 |
|         Answer63496256_Vernou |      .NET 4.8 |        10000 |            2000 |              Random |       645.6 μs |      27.28 μs |      39.99 μs |       632.2 μs |  1.71 |    0.16 |
|                               |               |              |                 |                     |                |               |               |                |       |         |
|           RemoveUsingListCopy | .NET Core 3.1 |        10000 |            2000 |              Random |       391.5 μs |      10.39 μs |      15.55 μs |       391.6 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt | .NET Core 3.1 |        10000 |            2000 |              Random |     1,402.2 μs |      44.20 μs |      64.80 μs |     1,407.6 μs |  3.59 |    0.21 |
|       RemoveUsingMoveStrategy | .NET Core 3.1 |        10000 |            2000 |              Random |       557.9 μs |      19.73 μs |      27.00 μs |       557.2 μs |  1.43 |    0.10 |
| Answer63496225_TheodorZoulias | .NET Core 3.1 |        10000 |            2000 |              Random |       424.3 μs |      20.90 μs |      29.30 μs |       424.2 μs |  1.09 |    0.09 |
|       Answer63496768_CoolBots | .NET Core 3.1 |        10000 |            2000 |              Random |       535.0 μs |      19.37 μs |      27.16 μs |       537.1 μs |  1.37 |    0.08 |
|        Answer63495657_NetMage | .NET Core 3.1 |        10000 |            2000 |              Random |       557.7 μs |      18.73 μs |      25.63 μs |       550.0 μs |  1.43 |    0.09 |
|         Answer63496256_Vernou | .NET Core 3.1 |        10000 |            2000 |              Random |       554.2 μs |      13.82 μs |      18.45 μs |       554.0 μs |  1.42 |    0.07 |
|                               |               |              |                 |                     |                |               |               |                |       |         |
|           RemoveUsingListCopy |      .NET 4.8 |        10000 |            2000 |              Middle |       221.6 μs |       7.25 μs |      10.63 μs |       222.5 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt |      .NET 4.8 |        10000 |            2000 |              Middle |     1,195.3 μs |      20.01 μs |      28.69 μs |     1,187.7 μs |  5.42 |    0.30 |
|       RemoveUsingMoveStrategy |      .NET 4.8 |        10000 |            2000 |              Middle |       405.0 μs |      13.65 μs |      19.14 μs |       410.7 μs |  1.83 |    0.10 |
| Answer63496225_TheodorZoulias |      .NET 4.8 |        10000 |            2000 |              Middle |       206.3 μs |       8.62 μs |      12.09 μs |       204.9 μs |  0.94 |    0.08 |
|       Answer63496768_CoolBots |      .NET 4.8 |        10000 |            2000 |              Middle |       427.5 μs |      15.56 μs |      22.81 μs |       435.4 μs |  1.93 |    0.13 |
|        Answer63495657_NetMage |      .NET 4.8 |        10000 |            2000 |              Middle |       405.4 μs |      13.80 μs |      19.35 μs |       403.8 μs |  1.84 |    0.11 |
|         Answer63496256_Vernou |      .NET 4.8 |        10000 |            2000 |              Middle |       413.9 μs |      15.26 μs |      20.89 μs |       419.8 μs |  1.87 |    0.12 |
|                               |               |              |                 |                     |                |               |               |                |       |         |
|           RemoveUsingListCopy | .NET Core 3.1 |        10000 |            2000 |              Middle |       235.2 μs |      10.73 μs |      15.73 μs |       236.2 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt | .NET Core 3.1 |        10000 |            2000 |              Middle |     1,345.6 μs |      32.07 μs |      43.90 μs |     1,352.7 μs |  5.77 |    0.41 |
|       RemoveUsingMoveStrategy | .NET Core 3.1 |        10000 |            2000 |              Middle |       324.0 μs |       4.92 μs |       7.05 μs |       326.6 μs |  1.39 |    0.09 |
| Answer63496225_TheodorZoulias | .NET Core 3.1 |        10000 |            2000 |              Middle |       262.9 μs |       6.18 μs |       9.06 μs |       265.4 μs |  1.12 |    0.08 |
|       Answer63496768_CoolBots | .NET Core 3.1 |        10000 |            2000 |              Middle |       333.6 μs |      10.14 μs |      13.87 μs |       340.9 μs |  1.43 |    0.11 |
|        Answer63495657_NetMage | .NET Core 3.1 |        10000 |            2000 |              Middle |       313.5 μs |       9.05 μs |      12.69 μs |       310.5 μs |  1.34 |    0.11 |
|         Answer63496256_Vernou | .NET Core 3.1 |        10000 |            2000 |              Middle |       332.3 μs |       6.70 μs |       8.95 μs |       331.9 μs |  1.43 |    0.09 |
|                               |               |              |                 |                     |                |               |               |                |       |         |
|           RemoveUsingListCopy |      .NET 4.8 |     10000000 |            2000 |              Random |   253,977.1 μs |   2,721.70 μs |   3,989.43 μs |   253,809.0 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt |      .NET 4.8 |     10000000 |            2000 |              Random | 5,191,083.4 μs |  13,200.66 μs |  18,931.99 μs | 5,187,162.3 μs | 20.43 |    0.34 |
|       RemoveUsingMoveStrategy |      .NET 4.8 |     10000000 |            2000 |              Random |    65,365.4 μs |     422.41 μs |     592.16 μs |    65,307.3 μs |  0.26 |    0.00 |
| Answer63496225_TheodorZoulias |      .NET 4.8 |     10000000 |            2000 |              Random |   240,584.4 μs |   3,687.89 μs |   5,048.03 μs |   244,336.1 μs |  0.95 |    0.02 |
|       Answer63496768_CoolBots |      .NET 4.8 |     10000000 |            2000 |              Random |    54,168.4 μs |   1,001.37 μs |   1,436.14 μs |    53,390.3 μs |  0.21 |    0.01 |
|        Answer63495657_NetMage |      .NET 4.8 |     10000000 |            2000 |              Random |    72,501.4 μs |     452.46 μs |     634.29 μs |    72,161.2 μs |  0.29 |    0.00 |
|         Answer63496256_Vernou |      .NET 4.8 |     10000000 |            2000 |              Random |     5,814.0 μs |      89.71 μs |     128.67 μs |     5,825.3 μs |  0.02 |    0.00 |
|                               |               |              |                 |                     |                |               |               |                |       |         |
|           RemoveUsingListCopy | .NET Core 3.1 |     10000000 |            2000 |              Random |   239,784.0 μs |   2,721.35 μs |   3,902.88 μs |   241,125.5 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt | .NET Core 3.1 |     10000000 |            2000 |              Random | 5,538,337.5 μs | 353,505.30 μs | 495,565.06 μs | 5,208,226.1 μs | 23.12 |    2.15 |
|       RemoveUsingMoveStrategy | .NET Core 3.1 |     10000000 |            2000 |              Random |    33,071.8 μs |     103.80 μs |     138.57 μs |    33,030.5 μs |  0.14 |    0.00 |
| Answer63496225_TheodorZoulias | .NET Core 3.1 |     10000000 |            2000 |              Random |   240,825.5 μs |     851.49 μs |   1,248.11 μs |   240,520.9 μs |  1.00 |    0.02 |
|       Answer63496768_CoolBots | .NET Core 3.1 |     10000000 |            2000 |              Random |    26,265.0 μs |      90.76 μs |     124.23 μs |    26,253.0 μs |  0.11 |    0.00 |
|        Answer63495657_NetMage | .NET Core 3.1 |     10000000 |            2000 |              Random |    48,670.6 μs |     581.51 μs |     833.99 μs |    48,303.0 μs |  0.20 |    0.00 |
|         Answer63496256_Vernou | .NET Core 3.1 |     10000000 |            2000 |              Random |     5,905.5 μs |      96.27 μs |     131.78 μs |     5,915.1 μs |  0.02 |    0.00 |
|                               |               |              |                 |                     |                |               |               |                |       |         |
|           RemoveUsingListCopy |      .NET 4.8 |     10000000 |            2000 |              Middle |   153,776.2 μs |   2,454.90 μs |   3,674.38 μs |   152,872.0 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt |      .NET 4.8 |     10000000 |            2000 |              Middle | 5,245,952.0 μs |  13,845.58 μs |  20,294.67 μs | 5,252,922.4 μs | 34.10 |    0.81 |
|       RemoveUsingMoveStrategy |      .NET 4.8 |     10000000 |            2000 |              Middle |    33,233.6 μs |     110.33 μs |     158.24 μs |    33,217.3 μs |  0.22 |    0.01 |
| Answer63496225_TheodorZoulias |      .NET 4.8 |     10000000 |            2000 |              Middle |   128,949.8 μs |     560.72 μs |     804.17 μs |   128,724.9 μs |  0.84 |    0.02 |
|       Answer63496768_CoolBots |      .NET 4.8 |     10000000 |            2000 |              Middle |    48,965.1 μs |      70.75 μs |      94.45 μs |    48,957.3 μs |  0.32 |    0.01 |
|        Answer63495657_NetMage |      .NET 4.8 |     10000000 |            2000 |              Middle |    32,641.5 μs |      66.85 μs |      91.51 μs |    32,610.0 μs |  0.21 |    0.01 |
|         Answer63496256_Vernou |      .NET 4.8 |     10000000 |            2000 |              Middle |     2,982.2 μs |      29.47 μs |      41.31 μs |     2,961.9 μs |  0.02 |    0.00 |
|                               |               |              |                 |                     |                |               |               |                |       |         |
|           RemoveUsingListCopy | .NET Core 3.1 |     10000000 |            2000 |              Middle |   144,208.7 μs |   2,035.88 μs |   2,984.16 μs |   142,693.2 μs |  1.00 |    0.00 |
|           RemoveUsingRemoveAt | .NET Core 3.1 |     10000000 |            2000 |              Middle | 5,235,957.7 μs |  13,674.19 μs |  20,043.46 μs | 5,241,536.1 μs | 36.32 |    0.78 |
|       RemoveUsingMoveStrategy | .NET Core 3.1 |     10000000 |            2000 |              Middle |    16,547.3 μs |      72.72 μs |     101.95 μs |    16,520.7 μs |  0.11 |    0.00 |
| Answer63496225_TheodorZoulias | .NET Core 3.1 |     10000000 |            2000 |              Middle |   137,218.2 μs |     716.45 μs |     980.69 μs |   137,027.0 μs |  0.95 |    0.02 |
|       Answer63496768_CoolBots | .NET Core 3.1 |     10000000 |            2000 |              Middle |    23,728.5 μs |      79.84 μs |     111.93 μs |    23,689.9 μs |  0.16 |    0.00 |
|        Answer63495657_NetMage | .NET Core 3.1 |     10000000 |            2000 |              Middle |    17,298.3 μs |     216.46 μs |     310.44 μs |    17,165.5 μs |  0.12 |    0.00 |
|         Answer63496256_Vernou | .NET Core 3.1 |     10000000 |            2000 |              Middle |     2,999.7 μs |      85.78 μs |     123.03 μs |     2,957.1 μs |  0.02 |    0.00 |
[snip]

Benchmark code基准代码

To see the various implementations, scroll a third of the way down and look for the methods decorated with [Benchmark()] .要查看各种实现,请向下滚动三分之一并查找用[Benchmark()]装饰的方法。

This requires the BenchmarkDotNet package .这需要BenchmarkDotNet package

using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;
using BenchmarkDotNet.Attributes;

namespace SO63495264
{
    public class Benchmarks
    {
        public enum RemovalLocation
        {
            Random,
            Beginning,
            Middle,
            End
        }

        [Params(10_000, 10_000_000)]
        public int DataListSize
        {
            get; set;
        }

        [Params(2_000)]
        public int RemovalListSize
        {
            get; set;
        }

        //[ParamsAllValues()]
        [Params(RemovalLocation.Random, RemovalLocation.Middle)]
        public RemovalLocation RemovalListLocation
        {
            get; set;
        }

        public List<int> DataList
        {
            get;
            set;
        }

        public IReadOnlyList<int> RemovalList
        {
            get;
            set;
        }

        [GlobalSetup()]
        public void GlobalSetup()
        {
            IEnumerable<int> removalIndices;

            if (RemovalListLocation == RemovalLocation.Random)
            {
                // Pass a constant seed so the same indices get generated for every run
                Random random = new Random(12345);

                removalIndices = Enumerable.Range(0, RemovalListSize)
                    .Select(i => random.Next(DataListSize));
            }
            else
            {
                int removalSegmentOffset = RemovalListLocation switch {
                    RemovalLocation.Beginning => 0,
                    RemovalLocation.Middle => (DataListSize - RemovalListSize) / 2,
                    RemovalLocation.End => DataListSize - RemovalListSize,
                    _ => throw new Exception($"Unexpected {nameof(RemovalLocation)} enumeration value {RemovalListLocation}.")
                };

                removalIndices = Enumerable.Range(removalSegmentOffset, RemovalListSize);
            }

            // For efficiency, create a single List<int> to be reused by all iterations for a given set of parameters
            DataList = new List<int>(DataListSize);
            RemovalList = removalIndices.ToList().AsReadOnly();
        }

        [IterationSetup()]
        public void IterationSetup()
        {
            // Each iteration could modify DataList, so repopulate its elements for the
            // next iteration.  DataList is either new or has been Clear()ed by IterationCleanup().
            for (int i = 0; i < DataListSize; i++)
                DataList.Add(i);
        }

        [IterationCleanup()]
        public void IterationCleanup()
        {
            DataList.Clear();
        }

        [GlobalCleanup()]
        public void GlobalCleanup()
        {
            // Force collection of the List<> for the current set of parameters
            int generation = GC.GetGeneration(DataList);

            DataList = null;
            GC.Collect(generation, GCCollectionMode.Forced, true);
        }

        [Benchmark(Baseline = true)]
        public List<int> RemoveUsingListCopy()
        {
            HashSet<int> removalSet = RemovalList.ToHashSet();
            List<int> newList = new List<int>(DataList.Count - removalSet.Count);

            for (int index = 0; index < DataList.Count; index++)
                if (!removalSet.Contains(index))
                    newList.Add(DataList[index]);

            return newList;
        }

        // Based on Stack Overflow question 63495264
        // https://stackoverflow.com/q/63495264/150605
        [Benchmark()]
        public List<int> RemoveUsingRemoveAt()
        {
            // The collection of indices to remove must be in decreasing order
            // with no duplicates; all are assumed to be in-range for DataList
            foreach (int index in RemovalList.Distinct().OrderByDescending(i => i))
                DataList.RemoveAt(index);

            return DataList;
        }

        // From Stack Overflow answer 63498191
        // https://stackoverflow.com/a/63498191/150605
        [Benchmark()]
        public List<int> RemoveUsingMoveStrategy()
        {
            // The collection of indices to remove must be in increasing order
            // with no duplicates; all are assumed to be in-range for DataList
            int[] coreIndices = RemovalList.Distinct().OrderBy(i => i).ToArray();
            List<(int startIndex, int count, int distance)> moveOperations
                = new List<(int, int, int)>();

            int candidateIndex;
            for (int i = 0; i < coreIndices.Length; i = candidateIndex)
            {
                int currentIndex = coreIndices[i];
                int elementCount = 1;

                // Merge removal of consecutive indices into one operation
                while ((candidateIndex = i + elementCount) < coreIndices.Length
                    && coreIndices[candidateIndex] == currentIndex + elementCount
                )
                {
                    elementCount++;
                }

                int nextIndex = candidateIndex < coreIndices.Length
                    ? coreIndices[candidateIndex]
                    : DataList.Count;
                int moveCount = nextIndex - currentIndex - elementCount;

                // Removing one or more elements from the end of the
                // list will result in a no-op, 0-length move; omit it
                if (moveCount > 0)
                {
                    moveOperations.Add(
                        (
                            startIndex: currentIndex + elementCount,
                            count: moveCount,
                            distance: i + elementCount
                        )
                    );
                }
            }

            // Perform the element moves
            foreach ((int startIndex, int count, int distance) in moveOperations)
            {
                for (int offset = 0; offset < count; offset++)
                {
                    int sourceIndex = startIndex + offset;
                    int targetIndex = sourceIndex - distance;

                    DataList[targetIndex] = DataList[sourceIndex];
                }
            }

            // "Trim" the number of removed elements from the end of the list
            DataList.RemoveRange(DataList.Count - coreIndices.Length, coreIndices.Length);

            return DataList;
        }

        // Adapted from Stack Overflow answer 63496225 revision 4
        // https://stackoverflow.com/revisions/63496225/4
        [Benchmark()]
        public List<int> Answer63496225_TheodorZoulias()
        {
            HashSet<int> indicesSet = new HashSet<int>(RemovalList);
            int index = 0;

            DataList.RemoveAll(_ => indicesSet.Contains(index++));

            return DataList;
        }


        // Adapted from Stack Overflow answer 63496768 revision 2
        // https://stackoverflow.com/revisions/63496768/2
        [Benchmark()]
        public List<int> Answer63496768_CoolBots()
        {
            List<int> sortedIndicies = RemovalList.Distinct().OrderBy(i => i).ToList();

            int sourceStartIndex = 0;
            int destStartIndex = 0;
            int spanLength = 0;

            int skipCount = 0;

            // Copy items up to last index to be skipped
            foreach (int skipIndex in sortedIndicies)
            {
                spanLength = skipIndex - sourceStartIndex;
                destStartIndex = sourceStartIndex - skipCount;

                for (int i = sourceStartIndex; i < sourceStartIndex + spanLength; i++)
                {
                    DataList[destStartIndex] = DataList[i];
                    destStartIndex++;
                }

                sourceStartIndex = skipIndex + 1;
                skipCount++;
            }

            // Copy remaining items (between last index to be skipped and end of list)
            spanLength = DataList.Count - sourceStartIndex;
            destStartIndex = sourceStartIndex - skipCount;

            for (int i = sourceStartIndex; i < sourceStartIndex + spanLength; i++)
            {
                DataList[destStartIndex] = DataList[i];
                destStartIndex++;
            }

            DataList.RemoveRange(destStartIndex, sortedIndicies.Count);

            return DataList;
        }

        // Adapted from Stack Overflow answer 63495657 revision 6
        // https://stackoverflow.com/revisions/63495657/6
        [Benchmark()]
        public List<int> Answer63495657_NetMage()
        {
            List<int> removeAtList = RemovalList.Distinct().OrderBy(i => i).ToList();

            int srcCount = DataList.Count;
            int ralCount = removeAtList.Count;
            int removeAtIndice = 1;
            int freeIndex = removeAtList[0];
            int current = freeIndex + 1;
            while (current < srcCount)
            {
                while (removeAtIndice < ralCount && current == removeAtList[removeAtIndice])
                {
                    ++current;
                    ++removeAtIndice;
                }

                if (current < srcCount)
                    DataList[freeIndex++] = DataList[current++];
            }

            DataList.RemoveRange(freeIndex, srcCount - freeIndex);

            return DataList;
        }

        // Adapted from Stack Overflow answer 63496256 revision 3
        // https://stackoverflow.com/revisions/63496256/3
        [Benchmark()]
        public List<int> Answer63496256_Vernou()
        {
            List<int> indices = RemovalList.Distinct().OrderBy(i => i).ToList();

            //Get the internal array
            int[] largeArray = (int[]) typeof(List<int>)
                .GetField("_items", BindingFlags.Instance | BindingFlags.NonPublic)
                .GetValue(DataList);

            int current = 0;
            int copyFrom = 0;

            for (int i = 0; i < indices.Count; i++)
            {
                int copyTo = indices[i];
                if (copyTo < copyFrom)
                {
                    //In case the indice is duplicate,
                    //The item is already passed
                    continue;
                }
                int copyLength = copyTo - copyFrom;
                Array.Copy(largeArray, copyFrom, largeArray, current, copyLength);
                current += copyLength;
                copyFrom = copyTo + 1;
            }
            //Resize the internal array
            DataList.RemoveRange(DataList.Count - indices.Count, indices.Count);

            return DataList;
        }
    }
}

Launcher code启动器代码

RunBenchmark() defines the type of benchmark job to run and for which runtime(s). RunBenchmark()定义要运行的基准测试作业的类型以及运行时。

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Reflection;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Environments;
using BenchmarkDotNet.Jobs;

namespace SO63495264
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args?.Length == 0)
            {
                string assemblyFilePath = Assembly.GetExecutingAssembly().Location;
                string assemblyFileName = System.IO.Path.GetFileName(assemblyFilePath);

                Console.WriteLine($"{assemblyFileName} {{ benchmark | test }}");
            }
            else if (string.Equals(args[0], "benchmark", StringComparison.OrdinalIgnoreCase))
                RunBenchmark();
            else if (string.Equals(args[0], "test", StringComparison.OrdinalIgnoreCase))
                RunTest();
            else
                Console.WriteLine($"Unexpected parameter \"{args[0]}\"");
        }

        static void RunBenchmark()
        {
            Job baseJob = Job.MediumRun;
            IConfig config = DefaultConfig.Instance;

            foreach (Runtime runtime in new Runtime[] { CoreRuntime.Core31, ClrRuntime.Net48 })
            {
                config = config.AddJob(
                    baseJob.WithRuntime(runtime)
                );
            }

            BenchmarkDotNet.Running.BenchmarkRunner.Run<Benchmarks>(config);
        }

        static void RunTest()
        {
            const int ListSize = 20;
            const int MaxDisplayElements = 20;

            IEnumerable<int> data = Enumerable.Range(0, ListSize);
            IReadOnlyList<int> indices = new List<int>(
                new int[] {
                    0,                               1, // First two indices
                    ListSize / 4,
                    ListSize / 2,     ListSize / 2 + 1, // Consecutive indices
                    ListSize / 4 * 3, ListSize / 4 * 3, // Duplicate indices
                    ListSize - 2,     ListSize - 1      // Last two indices
                }
            ).AsReadOnly();

            // Discover and invoke the benchmark methods the same way BenchmarkDotNet would
            Benchmarks benchmarks = new Benchmarks() {
                RemovalList = indices
            };
            IEnumerable<MethodInfo> benchmarkMethods = benchmarks.GetType()
                .GetMethods(BindingFlags.Instance | BindingFlags.Public)
                .Where(
                    method => method.CustomAttributes.Any(
                        attribute => attribute.AttributeType == typeof(BenchmarkAttribute)
                    )
                );

            // Call a known-good method to get the correct results for comparison
            benchmarks.DataList = data.ToList();
            List<int> controlList = benchmarks.RemoveUsingListCopy();

            foreach (MethodInfo benchmarkMethod in benchmarkMethods)
            {
                List<int> inputList = data.ToList();

                benchmarks.DataList = inputList;
                Stopwatch watch = Stopwatch.StartNew();
                List<int> outputList = (List<int>) benchmarkMethod.Invoke(benchmarks, Array.Empty<object>());
                watch.Stop();

                Console.WriteLine($"{benchmarkMethod.Name}():");
                Console.WriteLine($"\tElements: {{ {string.Join(", ", outputList.Take(MaxDisplayElements))}{(outputList.Count > MaxDisplayElements ? ", ..." : "") } }}");
                Console.WriteLine($"\t   Count: {outputList.Count:N0}");
                Console.WriteLine($"\tSequence: {(outputList.SequenceEqual(controlList) ? "E" : "*Not* e")}qual to control list");
                Console.WriteLine($"\tDuration: {watch.Elapsed.TotalMilliseconds:N2} milliseconds");
                Console.WriteLine($"\tReturned: {(object.ReferenceEquals(outputList, inputList) ? "Input list" : "New list")}");
            }
        }
    }
}

In order to benchmark on .NET Framework (4.8) I had to add the following properties to my .NET Core .csproj project file:为了对 .NET Framework (4.8) 进行基准测试,我必须将以下属性添加到我的 .NET Core .csproj项目文件中:

<TargetFrameworks>netcoreapp3.1;net48</TargetFrameworks>
<LangVersion>8.0</LangVersion>

41 characters to spare!剩余41 个字符

Since the order of the source list is significant, you can move each item down the list, skipping indices to be removed, and then remove the end of the list.由于源列表的顺序很重要,您可以将每个项目向下移动列表,跳过要删除的索引,然后删除列表的末尾。

UPDATE: Took the.Net Core source code for RemoveAll and modified it for a indice list instead of a predicate.更新:获取RemoveAll的 .Net Core 源代码并将其修改为索引列表而不是谓词。

UPDATE 2: Optimized to not repeat tests if possible.更新 2:优化为尽可能不重复测试。

UPDATE 3: Simplified as optimization having extra code proved slower in benchmarks.更新 3:简化为具有额外代码的优化在基准测试中被证明较慢。

Having src as the large list, and removeAtList as the indices to remove in some random order, you can do:src作为大列表,并将removeAtList作为要以某种随机顺序删除的索引,您可以执行以下操作:

removeAtList.Sort();

var srcCount = src.Count;
var ralCount = removeAtList.Count;
var removeAtIndice = 1;
var freeIndex = removeAtList[0];
var current = freeIndex+1;
while (current < srcCount) {
    while (removeAtIndice < ralCount && current == removeAtList[removeAtIndice]) {
        ++current;
        ++removeAtIndice;
    }

    if (current < srcCount)
        src[freeIndex++] = src[current++];
}

src.RemoveRange(freeIndex, srcCount-freeIndex);

For a one billion element list of random integers, and a 1000 - 3000 element list of random indices, I get 1.1 ms per remove with this algorithm.对于 10 亿个随机整数元素列表和 1000 - 3000 个随机索引元素列表,我使用此算法每次删除得到 1.1 毫秒。 Using RemoveAt , I get over 232.77 ms per remove, so this is about 200 times faster.使用RemoveAt ,我每次删除的时间超过 232.77 毫秒,所以这快了大约 200 倍。

One way to allow this to be parallelized would be to break the list into multiple fragments;允许并行化的一种方法是将列表分成多个片段; perhaps initially (arbitrarily) separate slabs of 1 million elements.也许最初(任意)分离 100 万个元素的板。 As long as each slab maintains its own count, you can split the work by index into removals from different slabs (based purely on the counts), and then do the actual removal work concurrently.只要每个 slab 维护自己的计数,您就可以按索引将工作拆分为从不同的 slab 中移除(完全基于计数),然后并发执行实际的移除工作。 If you leave some spare capacity in each, you can also add elements into the middle more cheaply, as you are usually only touching one slab.如果你在每个中都留有一些备用容量,你还可以更便宜地向中间添加元素,因为你通常只接触一个 slab。 Random access will be a little slower, as you may need to look at multiple slab counts to determine the correct slab, but if the slab counts are maintained in a contiguous vector (rather than against each slab), you should have excellent memory cache hit while doing it.随机访问会稍微慢一些,因为您可能需要查看多个 slab 计数以确定正确的 slab,但是如果 slab 计数保持在一个连续的向量中(而不是针对每个 slab),您应该有出色的 memory 缓存命中在做的时候。

This answer is based on other answers here - mainly, I am shifting elements up within the list, as suggested by @Vernou (in their answer) and @BACON (in comments).这个答案是基于这里的其他答案——主要是,我在列表中向上移动元素,正如@Vernou(在他们的答案中)和@BACON(在评论中)所建议的那样。 This one is finally performant (unlike my first few approaches), and is faster than other solutions posted so far, at least in my tests - I tried OP's setup of 2_000_000_000 entries and 2_000 indicies - runtime is under 10 seconds on my laptop (i7-8550U @ 1.8GHz, 16GB RAM):这个最终是高性能的(不像我的前几种方法),并且比迄今为止发布的其他解决方案更快,至少在我的测试中 - 我尝试了 OP 的2_000_000_000条目和2_000设置 - 我的笔记本电脑上的运行时间不到 10 秒(i7 -8550U @ 1.8GHz,16GB 内存):

static void FilterOutIndicies(List<int> values, List<int> sortedIndicies)
{
    int sourceStartIndex = 0;
    int destStartIndex = 0;
    int spanLength = 0;

    int skipCount = 0;

    // Copy items up to last index to be skipped
    foreach (var skipIndex in sortedIndicies)
    {
        spanLength = skipIndex - sourceStartIndex;
        destStartIndex = sourceStartIndex - skipCount;

        for (int i = sourceStartIndex; i < sourceStartIndex + spanLength; i++)
        {
            values[destStartIndex] = values[i];
            destStartIndex++;
        }

        sourceStartIndex = skipIndex + 1;
        skipCount++;
    }

    // Copy remaining items (between last index to be skipped and end of list)
    spanLength = values.Count - sourceStartIndex;
    destStartIndex = sourceStartIndex - skipCount;

    for (int i = sourceStartIndex; i < sourceStartIndex + spanLength; i++)
    {
        values[destStartIndex] = values[i];
        destStartIndex++;
    }

    values.RemoveRange(destStartIndex, sortedIndicies.Count);
}

The method List.RemoveAt copy all next items from the removed item. List.RemoveAt方法从删除的项目中复制所有下一个项目。 In your case, this copy 2,000 * 2,000,000,000 times each items (not really, but the true is near).在您的情况下,此副本每个项目 2,000 * 2,000,000,000 次(不是真的,但接近真实)。

A solution is manually copy item between the removed item and the next removed item:一种解决方案是在删除的项目和下一个删除的项目之间手动复制项目:

static void Main(string[] args)
{
    var largeList = Enumerable.Range(0, 2_000_000_000).ToList();

    var indices = new List<int>();
    var rand = new Random();
    for (var i = 0; i < 20000; i++)
    {
        indices.Add(rand.Next(0, largeList.Count - 1));
    }
    indices.Sort();

    var watch = new Stopwatch();
    watch.Start();

    // You can convert the list to array with ToArray,
    // but this duplicate the memory use.
    // Or get the internal array by reflection,
    // but reflection on external library isn't recommended
    var largeArray = (int[])typeof(List<int>)
        .GetField("_items", BindingFlags.Instance | BindingFlags.NonPublic)
        .GetValue(largeList);

    var current = 0;
    var copyFrom = 0;

    for (var i = 0; i < indices.Count; i++)
    {
        var copyTo = indices[i];
        if (copyTo < copyFrom)
        {
            //In case the indice is duplicate,
            //The item is already passed
            continue;
        }
        var copyLength = copyTo - copyFrom;
        Array.Copy(largeArray, copyFrom, largeArray, current, copyLength);
        current += copyLength;
        copyFrom = copyTo + 1;
    }
    //Resize the internal array
    largeList.RemoveRange(largeList.Count - indices.Count, indices.Count);

    watch.Stop();
    Console.WriteLine(watch.Elapsed);
    Console.WriteLine(largeList.Count);
}

When you have multiple items to remove from a List , and replacing the List with a new List is not an option, the most efficient way is to use the RemoveAll method instead of the RemoveAt .当您有多个项目要从List中删除,并且无法用新List替换List时,最有效的方法是使用RemoveAll方法而不是RemoveAt The RemoveAll rearranges the internal state of the List only once, instead of doing it once per each item removed. RemoveAll仅重新排列List的内部 state 一次,而不是对每个删除的项目重新排列一次。

The RemoveAll accepts aPredicate<T> that will be invoked once for each item in the list (the large list). RemoveAll接受一个Predicate<T> ,它将为列表(大列表)中的每个项目调用一次。 Unfortunately this delegate doesn't receive the index of the currently tested item.不幸的是,这个代表没有收到当前测试项目的索引。 You could however depend on knowing how the RemoveAll is implemented.但是,您可能依赖于了解RemoveAll的实现方式。 The source code reveals that the items are tested sequentially in ascending order. 代码显示项目按升序顺序测试。 So based on this knowledge you could remove the selected indices from the list, very efficiently, with this three-liner:因此,基于这些知识,您可以使用以下三行非常有效地从列表中删除选定的索引:

var indicesSet = new HashSet<int>(indices);
int index = 0;
largeList.RemoveAll(_ => indicesSet.Contains(index++));

But you really shouldn't .但你真的不应该 This solution will break horribly if a future version of .NET comes with a different internal implementation of the RemoveAll .如果 .NET 的未来版本带有RemoveAll的不同内部实现,则此解决方案将严重崩溃。 So consider this to be a dirty hack, and not a production-quality solution to the problem.因此,将此视为肮脏的 hack,而不是解决问题的生产质量解决方案。

I have a very large list of integers (about 2 billion elements) and a list with indices (couple thousand elements) at which I need to remove elements from the first list.我有一个非常大的整数列表(大约 20 亿个元素)和一个带有索引的列表(几千个元素),我需要从第一个列表中删除元素。 My current approach is to loop over all indices in the second list, passing each to the RemoveAt() method of the first list:我目前的方法是遍历第二个列表中的所有索引,将每个索引传递给第一个列表的RemoveAt()方法:

indices.Sort();
indices.Reverse();
for (i = 0; i < indices.Count; i++)
{
    largeList.RemoveAt(indices[i]);
}

However, it takes about 2 minutes to finish.但是,大约需要 2 分钟才能完成。 I really need to perform this operation much faster.我真的需要更快地执行此操作。 Is there any way of optimizing this?有没有办法优化这个?

I have a Intel i9X CPU with 10 cores, so maybe some way of parallel processing?我有一个 10 核的 Intel i9X CPU,所以也许有某种并行处理方式?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM