简体   繁体   中英

Paradox: Why is yield return faster than list here

People have proven countless times, that yield return is slower than list .

Example: Is 'yield return' slower than "old school" return?

However when I tried, a benchmark, I got the opposite results:

Results:
TestYield: Time =1.19 sec
TestList : Time =4.22 sec

Here, List is 400% slower. This happens regardless size. This makes no sense.

IEnumerable<int> CreateNumbers() //for yield
{
    for (int i = 0; i < Size; i++) yield return i;
}

IEnumerable<int> CreateNumbers() //for list
{
    var list = new List<int>();
    for (int i = 0; i < Size; i++) list.Add(i);
    return list;
}

Here is how I consume them:

foreach (var value in CreateNumbers()) sum += value;

I use all the correct benchmark rules to avoid conflicting results so this is not the issue.

If you see the underlying code, yield return is a state machine abomination, yet it is faster. Why?

Edit: All answers replicated that indeed Yield is faster than list.

New Results With Size set on constructor:
TestYield: Time =1.001
TestList: Time =1.403
From a 400% slower difference, down to 40% slower difference.

However, the insights are mind breaking. It means that all those programmers from 1960 and later that used list as the default collection were wrong and should have been shot (fired), because they didn't use the best tool for the situation (yield).

The answers argued that yield should be faster because it is not materialized.

1) I do not accept this logic. Yield has internal logic behind the scene, it is not a "theoretical model" but a compiler construct. Therefore it automatically materialises on consumption. I do not accept the argument that it "didn't materialise", since the cost is already paid on USE.

2) If a boat can travel by sea, but an old woman can't, you cannot demand the boat to "move by land". As you did here with the list. If a list requires materialization, and yield doesn't, that is not a "problem of yield" but instead a "feature". Yield should not be penalized in the test, just because it has more uses.

3) What i am arguing here is that the purpose of the test was to find the "Fastest collection" to consume / return results returned by a method if you know that the ENTIRE SET will be consumed.

Does yield become the new "De facto standard" for returning list arguments from methods.

Edit2: if i use pure inline array, it obtains the same performance as a Yield.

Test 3:
TestYield: Time =0.987
TestArray: Time =0.962
TestList: Time =1.516

int[] CreateNumbers()
{
    var list = new int[Size];
    for (int i = 0; i < Size; i++) list[i] = i;
    return list;
}

Therefore, yield is automatically inlined into an array. List isn't.

If you measure the version using yield without materializing the list, it will have an advantage over the other version as it won't have to allocate and resize a large list (as well as trigger GC).

Based on your edit I would like to add the following:

However, keep in mind that semantically you're looking at two different methods. One produces a collection . It is finite in size, you can store references to the collection, change its elements, and share it.

The other produces a sequence . It is potentially unbounded, you get a new copy each time you iterate over it, and there may or may not be a collection behind it.

They are not the same thing. The compiler doesn't create a collection to implement a sequence. If you implement a sequence by materializing a collection behind the scenes you will see similar performance as the version that uses a list.

BenchmarkDotNet doesn't allow you to time deferred execution by default so you have to construct a test that consumes the methods which is what I have done below. I ran this through BenchmarkDotNet and got the following.

       Method |     Mean |    Error |   StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
------------- |---------:|---------:|---------:|------------:|------------:|------------:|--------------------:|
 ConsumeYield | 475.5 us | 7.010 us | 6.214 us |           - |           - |           - |                40 B |
  ConsumeList | 958.9 us | 7.271 us | 6.801 us |    285.1563 |    285.1563 |    285.1563 |           1049024 B |

Notice the allocations. For some scenarios this could make a difference.

We can offset some of the allocations by allocating the correct size list, but ultimately this is not an apples to apples comparison. Numbers below.

       Method |     Mean |     Error |    StdDev | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
------------- |---------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
 ConsumeYield | 470.8 us |  2.508 us |  2.346 us |           - |           - |           - |                40 B |
  ConsumeList | 836.2 us | 13.456 us | 12.587 us |    124.0234 |    124.0234 |    124.0234 |            400104 B |

Code below.

[MemoryDiagnoser]
public class Test
{
    static void Main(string[] args)
    {
        var summary = BenchmarkRunner.Run<Test>();
    }

    public int Size = 100000;

    [Benchmark]
    public int ConsumeYield()
    {
        var sum = 0;
        foreach (var x in CreateNumbersYield()) sum += x;
        return sum;
    }

    [Benchmark]
    public int ConsumeList()
    {
        var sum = 0;
        foreach (var x in CreateNumbersList()) sum += x;
        return sum;
    }

    public IEnumerable<int> CreateNumbersYield() //for yield
    {
        for (int i = 0; i < Size; i++) yield return i;
    }

    public IEnumerable<int> CreateNumbersList() //for list
    {
        var list = new List<int>();
        for (int i = 0; i < Size; i++) list.Add(i);
        return list;
    }
}

A couple of things you must take into account:

  • List<T> consumes memory, but you can iterate it again and again without any additional resources. To achieve the same with yield , you need to materialize the sequence via ToList() .
  • it's desirable to set capacity when producing List<T> . This will avoid inner array resizing.

Here's what I've got:

class Program
{
    static void Main(string[] args)
    {
        // warming up
        CreateNumbersYield(1);
        CreateNumbersList(1, true);
        Measure(null, () => { });

        // testing
        var size = 1000000;

        Measure("Yield", () => CreateNumbersYield(size));
        Measure("Yield + ToList", () => CreateNumbersYield(size).ToList());
        Measure("List", () => CreateNumbersList(size, false));
        Measure("List + Set initial capacity", () => CreateNumbersList(size, true));

        Console.ReadLine();
    }

    static void Measure(string testName, Action action)
    {
        var sw = new Stopwatch();

        sw.Start();
        action();
        sw.Stop();

        Console.WriteLine($"{testName} completed in {sw.Elapsed}");
    }

    static IEnumerable<int> CreateNumbersYield(int size) //for yield
    {
        for (int i = 0; i < size; i++)
        {
            yield return i;
        }
    }

    static IEnumerable<int> CreateNumbersList(int size, bool setInitialCapacity) //for list
    {
        var list = setInitialCapacity ? new List<int>(size) : new List<int>();

        for (int i = 0; i < size; i++)
        {
            list.Add(i);
        }

        return list;
    }
}

Results (release build):

Yield completed in 00:00:00.0001683
Yield + ToList completed in 00:00:00.0121015
List completed in 00:00:00.0060071
List + Set initial capacity completed in 00:00:00.0033668

If we compare comparable cases ( Yield + ToList & List + Set initial capacity ), yield is much slower.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM