简体   繁体   English

C#收益率回报表现

[英]C# yield return performance

How much space is reserved to the underlying collection behind a method using yield return syntax WHEN I PERFORM a ToList() on it? 使用yield return语法的方法后面的底层集合保留了多少空间当我在其上执行ToList()时? There's a chance it will reallocate and thus decrease performance if compared to the standard approach where i create a list with predefined capacity? 如果与我创建具有预定义容量的列表的标准方法相比,它有可能重新分配并因此降低性能?

The two scenarios: 这两种情况:

    public IEnumerable<T> GetList1()
    {
        foreach( var item in collection )
            yield return item.Property;
    }

    public IEnumerable<T> GetList2()
    {
        List<T> outputList = new List<T>( collection.Count() );
        foreach( var item in collection )
            outputList.Add( item.Property );

        return outputList;
    }

yield return does not create an array that has to be resized, like what List does; yield return不会创建一个必须调整大小的数组,就像List所做的那样; instead, it creates an IEnumerable with a state machine. 相反,它使用状态机创建IEnumerable

For instance, let's take this method: 例如,让我们采用这种方法:

public static IEnumerable<int> Foo()
{
    Console.WriteLine("Returning 1");
    yield return 1;
    Console.WriteLine("Returning 2");
    yield return 2;
    Console.WriteLine("Returning 3");
    yield return 3;
}

Now let's call it and assign that enumerable to a variable: 现在让我们调用它并将可枚举赋值给变量:

var elems = Foo();

None of the code in Foo has executed yet. 在代码中没有 Foo尚未执行。 Nothing will be printed on the console. 控制台上不会打印任何内容。 But if we iterate over it, like this: 但是如果我们迭代它,就像这样:

foreach(var elem in elems)
{
    Console.WriteLine( "Got " + elem );
}

On the first iteration of the foreach loop, the Foo method will be executed until the first yield return . foreach循环的第一次迭代中,将执行Foo方法,直到第一次yield return Then, on the second iteration, the method will "resume" from where it left off (right after the yield return 1 ), and execute until the next yield return . 然后,在第二次迭代中,该方法将从它停止的位置“恢复”(在yield return 1 ),并执行直到下一次yield return Same for all subsequent elements. 所有后续元素都相同。
At the end of the loop, the console will look like this: 在循环结束时,控制台将如下所示:

Returning 1
Got 1
Returning 2
Got 2
Returning 3
Got 3

This means you can write methods like this: 这意味着您可以编写如下方法:

public static IEnumerable<int> GetAnswers()
{
    while( true )
    {
        yield return 42;
    }
}

You can call the GetAnswers method, and every time you request an element, it'll give you 42; 你可以调用GetAnswers方法,每次你请求一个元素时,它都会给你42; the sequence never ends. 序列永远不会结束。 You couldn't do this with a List , because lists have to have a finite size. 您无法使用List执行此操作,因为列表必须具有有限的大小。

How much space is reserved to the underlying collection behind a method using yield return syntax? 使用yield return语法为方法后面的底层集合保留了多少空间?

There's no underlying collection. 没有潜在的集合。

There's an object, but it isn't a collection. 有一个对象,但它不是一个集合。 Just how much space it will take up depends on what it needs to keep track of. 它将占用多少空间取决于它需要跟踪的内容。

There's a chance it will reallocate 它有可能重新分配

No. 没有。

And thus decrease performance if compared to the standard approach where i create a list with predefined capacity? 如果与我创建具有预定义容量的列表的标准方法相比,从而降低性能?

It will almost certainly take up less memory than creating a list with a predefined capacity. 与创建具有预定义容量的列表相比,它几乎肯定会占用更少的内存。

Let's try a manual example. 我们来试试一个手册。 Say we had the following code: 假设我们有以下代码:

public static IEnumerable<int> CountToTen()
{
  for(var i = 1; i != 11; ++i)
    yield return i;
}

To foreach through this will iterate through the numbers 1 to 10 inclusive. foreach通过这个会遍历数字110的包容性。

Now let's do this the way we would have to if yield did not exist. 现在让我们按照yield不存在的方式做到这一点。 We'd do something like: 我们做的事情如下:

private class CountToTenEnumerator : IEnumerator<int>
{
  private int _current;
  public int Current
  {
    get
    {
      if(_current == 0)
        throw new InvalidOperationException();
      return _current;
    }
  }
  object IEnumerator.Current
  {
    get { return Current; }
  }
  public bool MoveNext()
  {
    if(_current == 10)
      return false;
    _current++;
    return true;
  }
  public void Reset()
  {
    throw new NotSupportedException();
    // We *could* just set _current back, but the object produced by
    // yield won't do that, so we'll match that.
  }
  public void Dispose()
  {
  }
}
private class CountToTenEnumerable : IEnumerable<int>
{
  public IEnumerator<int> GetEnumerator()
  {
    return new CountToTenEnumerator();
  }
  IEnumerator IEnumerable.GetEnumerator()
  {
    return GetEnumerator();
  }
}
public static IEnumerable<int> CountToTen()
{
  return new CountToTenEnumerable();
}

Now, for a variety of reasons this is quite different to the code you're likely to get from the version using yield , but the basic principle is the same. 现在,由于各种原因,这与使用yield可能从版本中获得的代码完全不同,但基本原理是相同的。 As you can see there are two allocations involved of objects (same number as if we had a collection and then did a foreach on that) and the storage of a single int. 正如您所看到的,对象涉及两个分配(相同的数字就好像我们有一个集合,然后foreach做了一个foreach )和一个int的存储。 In practice we can expect yield to store a few more bytes than that, but not a lot. 在实践中,我们可以期望yield存储比这更多的字节,但不是很多。

Edit: yield actually does a trick where the first GetEnumerator() call on the same thread that obtained the object returns that same object, doing double service for both cases. 编辑: yield实际上是一个技巧,在获得该对象的同一线程上的第一个GetEnumerator()调用返回同一个对象,为两种情况执行双重服务。 Since this covers over 99% of use cases yield actually does one allocation rather than two. 由于这涵盖了超过99%的用例,因此yield实际上只进行了一次分配而不是两次。

Now let's look at: 现在让我们来看看:

public IEnumerable<T> GetList1()
{
  foreach( var item in collection )
    yield return item.Property;
}

While this would result in more memory used than just return collection , it won't result in a lot more; 虽然这会导致使用更多内存而不仅仅是return collection ,但它不会导致更多内容; the only thing the enumerator produced really needs to keep track of is the enumerator produced by calling GetEnumerator() on collection and then wrapping that. 枚举器生成的唯一真正需要跟踪的是通过在collection上调用GetEnumerator()然后包装它而生成的枚举器。

This is going to be massively less memory than that of the wasteful second approach you mention, and much faster to get going. 与你提到的浪费的第二种方法相比,这将大大减少内存,并且要快得多。

Edit: 编辑:

You've changed your question to include "syntax WHEN I PERFORM a ToList() on it", which is worth considering. 你已经改变了你的问题,包括“我在其上执行ToList()时的语法”,值得考虑。

Now, here we need to add a third possibility: Knowledge of the collection's size. 现在,我们需要增加第三种可能性:了解集合的大小。

Here, there is the possibilty that using new List(capacity) will prevent allocations of the list being built. 在这里,有可能使用new List(capacity)将阻止正在构建的列表的分配。 That can indeed be a considerable saving. 这确实可以节省很多。

If the object that has ToList called on it implements ICollection<T> then ToList will end up first doing a single allocation of an internal array of T and then calling ICollection<T>.CopyTo() . 如果在其上调用ToList的对象实现了ICollection<T>那么ToList将首先完成对T的内部数组的单个分配,然后调用ICollection<T>.CopyTo()

This would mean that your GetList2 would result in a faster ToList() than your GetList1 . 这意味着您的GetList2将导致比GetList1更快的ToList()

However, your GetList2 has already wasted time and memory doing what ToList() will do with the results of GetList1 anyway! 但是,你的GetList2已经浪费了时间和内存来做ToList()无论如何都会对GetList1的结果做什么!

What it should have done here was just return new List<T>(collection); 它应该在这里做的只是return new List<T>(collection); and be done with it. 并完成它。

If though we need to actually do something inside GetList1 or GetList2 (eg convert elements, filter elements, track averages, and so on) then GetList1 is going to be faster and lighter on memory. 如果我们需要在GetList1GetList2实际执行某些GetList2 (例如转换元素,过滤元素,跟踪平均值等),那么GetList1将在内存上更快更GetList1 Much lighter if we never call ToList() on it, and slightly ligher if we do call ToList() because again, the faster and lighter ToList() is offset by GetList2 being slower and heavier in the first place by exactly the same amount. 轻得多,如果我们永远不会调用ToList()就可以了,稍微ligher如果我们调用ToList()因为再次,更快,更轻ToList()被抵消GetList2是由完全相同的量在首位慢和更重。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM