简体   繁体   English

悖论:为什么收益率的回报快于此处的列表

[英]Paradox: Why is yield return faster than list here

People have proven countless times, that yield return is slower than list . 人们已经证明了无数次, yield return慢于list

Example: Is 'yield return' slower than "old school" return? 示例: “收益率回报”是否慢于“旧学校”回归?

However when I tried, a benchmark, I got the opposite results: 然而,当我尝试一个基准测试时,我得到了相反的结果:

Results:
TestYield: Time =1.19 sec
TestList : Time =4.22 sec

Here, List is 400% slower. 在这里,List慢了400%。 This happens regardless size. 无论大小如何都会发生 This makes no sense. 这毫无意义。

IEnumerable<int> CreateNumbers() //for yield
{
    for (int i = 0; i < Size; i++) yield return i;
}

IEnumerable<int> CreateNumbers() //for list
{
    var list = new List<int>();
    for (int i = 0; i < Size; i++) list.Add(i);
    return list;
}

Here is how I consume them: 以下是我如何使用它们:

foreach (var value in CreateNumbers()) sum += value;

I use all the correct benchmark rules to avoid conflicting results so this is not the issue. 我使用所有正确的基准规则来避免冲突的结果,所以这不是问题。

If you see the underlying code, yield return is a state machine abomination, yet it is faster. 如果您看到底层代码,则yield return是状态机可憎的,但速度更快。 Why? 为什么?

Edit: All answers replicated that indeed Yield is faster than list. 编辑:所有答案都复制了,确实Yield比列表更快。

New Results With Size set on constructor:
TestYield: Time =1.001
TestList: Time =1.403
From a 400% slower difference, down to 40% slower difference.

However, the insights are mind breaking. 然而,这些见解让人心碎。 It means that all those programmers from 1960 and later that used list as the default collection were wrong and should have been shot (fired), because they didn't use the best tool for the situation (yield). 这意味着所有那些使用list作为默认集合的1960年及以后的程序员都是错误的并且应该被拍摄(触发),因为他们没有使用最好的工具来处理这种情况(产量)。

The answers argued that yield should be faster because it is not materialized. 答案认为产量应该更快,因为它没有实现。

1) I do not accept this logic. 1)我不接受这种逻辑。 Yield has internal logic behind the scene, it is not a "theoretical model" but a compiler construct. Yield具有幕后的内部逻辑,它不是“理论模型”,而是编译器构造。 Therefore it automatically materialises on consumption. 因此它会自动实现消费。 I do not accept the argument that it "didn't materialise", since the cost is already paid on USE. 我不接受它“没有实现”的论点,因为已经支付了USE的费用。

2) If a boat can travel by sea, but an old woman can't, you cannot demand the boat to "move by land". 2)如果一艘船可以在海上旅行,但是一位老妇人不能,则不能要求船“陆上移动”。 As you did here with the list. 正如你在这里列出的那样。 If a list requires materialization, and yield doesn't, that is not a "problem of yield" but instead a "feature". 如果列表需要实现,而yield不需要,那么这不是“产量问题”,而是“特征”。 Yield should not be penalized in the test, just because it has more uses. 产量不应该在测试中受到惩罚,因为它有更多的用途。

3) What i am arguing here is that the purpose of the test was to find the "Fastest collection" to consume / return results returned by a method if you know that the ENTIRE SET will be consumed. 3)我在这里争论的是,测试的目的是找到消耗/返回方法返回的结果的“最快集合”,如果你知道将使用整个集合。

Does yield become the new "De facto standard" for returning list arguments from methods. yield是否成为从方法返回列表参数的新“事实上的标准”。

Edit2: if i use pure inline array, it obtains the same performance as a Yield. Edit2:如果我使用纯内联数组,它会获得与Yield相同的性能。

Test 3:
TestYield: Time =0.987
TestArray: Time =0.962
TestList: Time =1.516

int[] CreateNumbers()
{
    var list = new int[Size];
    for (int i = 0; i < Size; i++) list[i] = i;
    return list;
}

Therefore, yield is automatically inlined into an array. 因此,yield会自动内联到数组中。 List isn't. 列表不是。

If you measure the version using yield without materializing the list, it will have an advantage over the other version as it won't have to allocate and resize a large list (as well as trigger GC). 如果使用yield测量版本而不实现列表,则它将优于其他版本,因为它不必分配和调整大型列表(以及触发GC)。

Based on your edit I would like to add the following: 根据您的编辑,我想添加以下内容:

However, keep in mind that semantically you're looking at two different methods. 但是,请记住,从语义上来说,您正在研究两种不同的方法。 One produces a collection . 一个产生一个集合 It is finite in size, you can store references to the collection, change its elements, and share it. 它的大小有限,您可以存储对集合的引用,更改其元素并共享它。

The other produces a sequence . 另一个产生序列 It is potentially unbounded, you get a new copy each time you iterate over it, and there may or may not be a collection behind it. 它可能是无限的,每次迭代它时都会获得一个新副本,并且它背后可能有也可能没有集合。

They are not the same thing. 它们不是同一件事。 The compiler doesn't create a collection to implement a sequence. 编译器不会创建集合来实现序列。 If you implement a sequence by materializing a collection behind the scenes you will see similar performance as the version that uses a list. 如果通过物化幕后集合执行顺序,你会看到性能,使用列表中的版本类似。

BenchmarkDotNet doesn't allow you to time deferred execution by default so you have to construct a test that consumes the methods which is what I have done below. BenchmarkDotNet不允许您默认延迟执行,因此您必须构建一个使用我在下面所做的方法的测试。 I ran this through BenchmarkDotNet and got the following. 我通过BenchmarkDotNet运行了这个并得到了以下内容。

       Method |     Mean |    Error |   StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
------------- |---------:|---------:|---------:|------------:|------------:|------------:|--------------------:|
 ConsumeYield | 475.5 us | 7.010 us | 6.214 us |           - |           - |           - |                40 B |
  ConsumeList | 958.9 us | 7.271 us | 6.801 us |    285.1563 |    285.1563 |    285.1563 |           1049024 B |

Notice the allocations. 注意分配。 For some scenarios this could make a difference. 对于某些情况,这可能会有所不同。

We can offset some of the allocations by allocating the correct size list, but ultimately this is not an apples to apples comparison. 我们可以通过分配正确的大小列表来抵消一些分配,但最终这不是苹果对苹果的比较。 Numbers below. 下面的数字。

       Method |     Mean |     Error |    StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
------------- |---------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
 ConsumeYield | 470.8 us |  2.508 us |  2.346 us |           - |           - |           - |                40 B |
  ConsumeList | 836.2 us | 13.456 us | 12.587 us |    124.0234 |    124.0234 |    124.0234 |            400104 B |

Code below. 代码如下。

[MemoryDiagnoser]
public class Test
{
    static void Main(string[] args)
    {
        var summary = BenchmarkRunner.Run<Test>();
    }

    public int Size = 100000;

    [Benchmark]
    public int ConsumeYield()
    {
        var sum = 0;
        foreach (var x in CreateNumbersYield()) sum += x;
        return sum;
    }

    [Benchmark]
    public int ConsumeList()
    {
        var sum = 0;
        foreach (var x in CreateNumbersList()) sum += x;
        return sum;
    }

    public IEnumerable<int> CreateNumbersYield() //for yield
    {
        for (int i = 0; i < Size; i++) yield return i;
    }

    public IEnumerable<int> CreateNumbersList() //for list
    {
        var list = new List<int>();
        for (int i = 0; i < Size; i++) list.Add(i);
        return list;
    }
}

A couple of things you must take into account: 您必须考虑以下几点:

  • List<T> consumes memory, but you can iterate it again and again without any additional resources. List<T>消耗内存,但您可以反复迭代它而无需任何其他资源。 To achieve the same with yield , you need to materialize the sequence via ToList() . 要实现与yield相同的效果,您需要通过ToList()实现序列。
  • it's desirable to set capacity when producing List<T> . 在生成List<T>时,最好设置容量。 This will avoid inner array resizing. 这将避免内部数组调整大小。

Here's what I've got: 这是我得到的:

class Program
{
    static void Main(string[] args)
    {
        // warming up
        CreateNumbersYield(1);
        CreateNumbersList(1, true);
        Measure(null, () => { });

        // testing
        var size = 1000000;

        Measure("Yield", () => CreateNumbersYield(size));
        Measure("Yield + ToList", () => CreateNumbersYield(size).ToList());
        Measure("List", () => CreateNumbersList(size, false));
        Measure("List + Set initial capacity", () => CreateNumbersList(size, true));

        Console.ReadLine();
    }

    static void Measure(string testName, Action action)
    {
        var sw = new Stopwatch();

        sw.Start();
        action();
        sw.Stop();

        Console.WriteLine($"{testName} completed in {sw.Elapsed}");
    }

    static IEnumerable<int> CreateNumbersYield(int size) //for yield
    {
        for (int i = 0; i < size; i++)
        {
            yield return i;
        }
    }

    static IEnumerable<int> CreateNumbersList(int size, bool setInitialCapacity) //for list
    {
        var list = setInitialCapacity ? new List<int>(size) : new List<int>();

        for (int i = 0; i < size; i++)
        {
            list.Add(i);
        }

        return list;
    }
}

Results (release build): 结果(发布版本):

Yield completed in 00:00:00.0001683
Yield + ToList completed in 00:00:00.0121015
List completed in 00:00:00.0060071
List + Set initial capacity completed in 00:00:00.0033668

If we compare comparable cases ( Yield + ToList & List + Set initial capacity ), yield is much slower. 如果我们比较可比情况( Yield + ToListList + Set initial capacity ),则yield 慢得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM