简体   繁体   English

在C#中使用yield return迭代器的目的/优势是什么?

[英]What is the purpose/advantage of using yield return iterators in C#?

All of the examples I've seen of using yield return x; 我看到的所有使用yield return x;的例子yield return x; inside a C# method could be done in the same way by just returning the whole list. 只需返回整个列表,就可以以相同的方式完成C#方法。 In those cases, is there any benefit or advantage in using the yield return syntax vs. returning the list? 在这些情况下,使用yield return语法与返回列表是否有任何好处或优势?

Also, in what types of scenarios would yield return be used that you couldn't just return the complete list? 此外,在什么类型的场景中会yield return ,你不能只返回完整的列表?

But what if you were building a collection yourself? 但是如果你自己建造一个系列呢?

In general, iterators can be used to lazily generate a sequence of objects . 通常,迭代器可用于延迟生成一系列对象 For example Enumerable.Range method does not have any kind of collection internally. 例如, Enumerable.Range方法内部没有任何类型的集合。 It just generates the next number on demand . 它只是按需生成下一个数字。 There are many uses to this lazy sequence generation using a state machine. 使用状态机生成这种延迟序列有很多用途。 Most of them are covered under functional programming concepts . 其中大多数都涵盖在函数式编程概念中

In my opinion, if you are looking at iterators just as a way to enumerate through a collection (it's just one of the simplest use cases), you're going the wrong way. 在我看来,如果你把迭代器看作是枚举集合的一种方式(它只是最简单的用例之一),那你就走错了路。 As I said, iterators are means for returning sequences. 正如我所说,迭代器是返回序列的手段。 The sequence might even be infinite . 序列甚至可能是无限的 There would be no way to return a list with infinite length and use the first 100 items. 无法返回无限长度的列表并使用前100个项目。 It has to be lazy sometimes. 偷懒的时候。 Returning a collection is considerably different from returning a collection generator (which is what an iterator is). 返回集合与返回集合生成器 (迭代器是什么) 有很大不同 It's comparing apples to oranges. 它将苹果与橙子进行比较。

Hypothetical example: 假设的例子:

static IEnumerable<int> GetPrimeNumbers() {
   for (int num = 2; ; ++num) 
       if (IsPrime(num))
           yield return num;
}

static void Main() { 
   foreach (var i in GetPrimeNumbers()) 
       if (i < 10000)
           Console.WriteLine(i);
       else
           break;
}

This example prints prime numbers less than 10000. You can easily change it to print numbers less than a million without touching the prime number generation algorithm at all. 此示例打印小于10000的素数。您可以轻松地将其更改为打印少于一百万的数字,而无需触及素数生成算法。 In this example, you can't return a list of all prime numbers because the sequence is infinite and the consumer doesn't even know how many items it wants from the start. 在此示例中,您不能返回所有素数的列表,因为序列是无限的,并且消费者甚至不知道它从一开始就想要多少项。

The fine answers here suggest that a benefit of yield return is that you don't need to create a list ; 这里的好答案表明,收益yield return的好处是你不需要创建一个列表 ; Lists can be expensive. 列表可能很昂贵。 (Also, after a while, you'll find them bulky and inelegant.) (此外,过了一会儿,你会发现它们笨重而且不够优雅。)

But what if you don't have a List? 但是如果你没有List怎么办?

yield return allows you to traverse data structures (not necessarily Lists) in a number of ways. yield return允许您以多种方式遍历数据结构 (不一定是列表)。 For example, if your object is a Tree, you can traverse the nodes in pre- or post- order without creating other lists or changing the underlying data structure. 例如,如果您的对象是树,则可以按前或后顺序遍历节点,而无需创建其他列表或更改基础数据结构。

public IEnumerable<T> InOrder()
{
    foreach (T k in kids)
        foreach (T n in k.InOrder())
            yield return n;
    yield return (T) this;
}

public IEnumerable<T> PreOrder()
{
    yield return (T) this;
    foreach (T k in kids)
        foreach (T n in k.PreOrder())
            yield return n;
}

Lazy Evaluation/Deferred Execution 延迟评估/延期执行

The "yield return" iterator blocks won't execute any of the code until you actually call for that specific result. 在您实际调用该特定结果之前,“yield return”迭代器块不会执行任何代码。 This means they can also be chained together efficiently. 这意味着它们也可以有效地链接在一起。 Pop quiz: how many times will the following code iterate over the file? 流行测验:以下代码将在文件上迭代多少次?

var query = File.ReadLines(@"C:\MyFile.txt")
                            .Where(l => l.Contains("search text") )
                            .Select(l => int.Parse(l.SubString(5,8))
                            .Where(i => i > 10 );

int sum=0;
foreach (int value in query) 
{
    sum += value;
}

The answer is exactly one, and that not until way down in the foreach loop. 答案恰好是一个,直到在foreach循环中向下。 Even though I have three separate linq operator functions, we still only loop through the contents of the file one time. 即使我有三个独立的linq运算符函数,我们仍然只循环遍历文件的内容一次。

This has benefits other than performance. 除性能外,这还有其他好处。 For example, I can write a fair simple and generic method to read and pre-filter a log file once, and use that same method in several different places, where each use adds on different filters. 例如,我可以编写一个简单而通用的方法来读取和预过滤日志文件一次,并在几个不同的地方使用相同的方法,每次使用都会添加不同的过滤器。 Thus, I maintain good performance while also efficiently re-using code. 因此,我保持良好的性能,同时也有效地重用代码。

Infinite Lists 无限的名单

See my answer to this question for a good example: 请参阅我对这个问题的回答,以获得一个好例子:
C# fibonacci function returning errors C#fibonacci函数返回错误

Basically, I implement the fibonacci sequence using an iterator block that will never stop (at least, not before reaching MaxInt), and then use that implementation in a safe way. 基本上,我使用迭代器块来实现斐波那契序列,该迭代器块永远不会停止(至少在到达MaxInt之前),然后以安全的方式使用该实现。

Improved Semantics and separation of concerns 改进的语义和关注点分离

Again using the file example from above, we can now easily separate the code that reads the file from the code that filters out un-needed lines from the code that actually parses the results. 再次使用上面的文件示例,我们现在可以轻松地将读取文件的代码与从实际解析结果的代码中过滤掉不需要的行的代码分开。 That first one, especially, is very re-usable. 特别是第一个是非常可重复使用的。

This is one of those things that's much harder to explain with prose than it is to just who with a simple visual 1 : 这是那些东西,是更难用散文来解释一个比它究竟是谁用一个简单的视觉1:

关注的命令与功能分离

If you can't see the image, it shows two versions of the same code, with background highlights for different concerns. 如果您看不到图像,则会显示相同代码的两个版本,并针对不同的问题提供背景突出显示。 The linq code has all of the colors nicely grouped, while the traditional imperative code has the colors intermingled. linq代码具有很好地分组的所有颜色,而传统的命令式代码具有混合的颜色。 The author argues (and I concur) that this result is typical of using linq vs using imperative code... that linq does a better job organizing your code to have a better flow between sections. 作者认为(并且我同意)这个结果是使用linq与使用命令式代码的典型结果...... linq在组织代码方面做得更好,以便在各个部分之间获得更好的流程。


1 I believe this to be the original source: https://twitter.com/mariofusco/status/571999216039542784 . 1我相信这是最初的来源: https//twitter.com/mariofusco/status/571999216039542784 Also note that this code is Java, but the C# would be similar. 另请注意,此代码是Java,但C#类似。

Sometimes the sequences you need to return are just too large to fit in the memory. 有时您需要返回的序列太大而无法放入内存中。 For example, about 3 months ago I took part in a project for data migration between MS SLQ databases. 例如,大约3个月前,我参加了一个MS SLQ数据库之间的数据迁移项目。 Data was exported in XML format. 数据以XML格式导出。 Yield return turned out to be quite useful with XmlReader . 对于XmlReader, 收益率回报非常有用。 It made programming quite easier. 它使编程变得更加容易。 For example, suppose a file had 1000 Customer elements - if you just read this file into memory, this will require to store all of them in memory at the same time, even if they are handled sequentially. 例如,假设一个文件有1000个Customer元素 - 如果您只是将此文件读入内存,则需要将所有文件同时存储在内存中,即使它们是按顺序处理的。 So, you can use iterators in order to traverse the collection one by one. 因此,您可以使用迭代器逐个遍历集合。 In that case you have to spend just memory for one element. 在这种情况下,你必须为一个元素花费内存。

As it turned out, using XmlReader for our project was the only way to make the application work - it worked for a long time, but at least it did not hang the entire system and did not raise OutOfMemoryException . 事实证明,对我们的项目使用XmlReader是使应用程序工作的唯一方法 - 它工作了很长时间,但至少它没有挂起整个系统并且没有引发OutOfMemoryException Of course, you can work with XmlReader without yield iterators. 当然,您可以使用XmlReader而不使用yield迭代器。 But iterators made my life much easier (I would not write the code for import so quickly and without troubles). 但是迭代器使我的生活变得更加轻松(我不会那么快地编写导入代码而没有麻烦)。 Watch this page in order to see, how yield iterators are used for solving real problems (not just scientific with infinite sequences). 观看此页面以了解如何使用yield迭代器来解决实际问题(不仅仅是无限序列的科学)。

In toy/demonstration scenarios, there isn't a lot of difference. 在玩具/演示场景中,没有太大的区别。 But there are situations where yielding iterators are useful - sometimes, the entire list isn't available (eg streams), or the list is computationally expensive and unlikely to be needed in its entirety. 但是在某些情况下,产生迭代器是有用的 - 有时候,整个列表不可用(例如流),或者列表计算成本高,并且不可能完全需要。

如果整个列表都是巨大的,它可能会占用大量的内存而只是为了坐下来,而在产量方面,你只需要在你需要的时候玩,不管有多少项。

请看看Eric White的博客(顺便说一下,优秀的博客)关于懒惰与热切评估的讨论

Using the yield return you can iterate over items without ever having to build a list. 使用yield return您可以迭代项目而无需构建列表。 If you don't need the list, but want to iterate over some set of items it can be easier to write 如果您不需要列表,但想要迭代某些项目,则可以更容易编写

foreach (var foo in GetSomeFoos()) {
    operate on foo
}

Than

foreach (var foo in AllFoos) {
    if (some case where we do want to operate on foo) {
        operate on foo
    } else if (another case) {
        operate on foo
    }
}

You can put all of the logic for determining whether or not you want to operate on foo inside your method using yield returns and you foreach loop can be much more concise. 您可以使用所有逻辑来确定是否要使用yield返回操作方法中的foo,并且foreach循环可以更加简洁。

Here's my previous accepted answer to exactly the same question: 这是我之前对完全相同问题的接受答案:

Yield keyword value added? 收益率关键字增值?

Another way to look at iterator methods is that they do the hard work of turning an algorithm "inside out". 查看迭代器方法的另一种方法是,他们努力将算法“从里到外”。 Consider a parser. 考虑一个解析器。 It pulls text from a stream, looks for patterns in it and generates a high-level logical description of the content. 它从流中提取文本,在其中查找模式并生成内容的高级逻辑描述。

Now, I can make this easy for myself as a parser author by taking the SAX approach, in which I have a callback interface that I notify whenever I find the next piece of the pattern. 现在,我可以通过采用SAX方法让我自己成为解析器作者,我有一个回调接口,每当我找到下一个模式时我都会通知它。 So in the case of SAX, each time I find the start of an element, I call the beginElement method, and so on. 所以在SAX的情况下,每当我找到一个元素的开头时,我都会调用beginElement方法,依此类推。

But this creates trouble for my users. 但这给我的用户带来了麻烦。 They have to implement the handler interface and so they have to write a state machine class that responds to the callback methods. 他们必须实现处理程序接口,因此他们必须编写响应回调方法的状态机类。 This is hard to get right, so the easiest thing to do is use a stock implementation that builds a DOM tree, and then they will have the convenience of being able to walk the tree. 这很难做到,所以最简单的方法是使用构建DOM树的库存实现,然后他们将能够方便地遍历树。 But then the whole structure gets buffered up in memory - not good. 但随后整个结构被缓存在内存中 - 并不好。

But how about instead I write my parser as an iterator method? 但是如何将我的解析器编写为迭代器方法呢?

IEnumerable<LanguageElement> Parse(Stream stream)
{
    // imperative code that pulls from the stream and occasionally 
    // does things like:

    yield return new BeginStatement("if");

    // and so on...
}

That will be no harder to write than the callback-interface approach - just yield return an object derived from my LanguageElement base class instead of calling a callback method. 这比回调接口方法更难写 - 只需返回从我的LanguageElement基类派生的对象,而不是调用回调方法。

The user can now use foreach to loop through my parser's output, so they get a very convenient imperative programming interface. 用户现在可以使用foreach循环遍历解析器的输出,因此他们可以获得一个非常方便的命令式编程接口。

The result is that both sides of a custom API look like they're in control , and hence are easier to write and understand. 结果是自定义API的两面看起来都像是在控制中 ,因此更容易编写和理解。

The basic reason for using yield is it generates/returns a list by itself. 使用yield的基本原因是它自己生成/返回一个列表。 We can use the returned list for iterating further. 我们可以使用返回的列表进一步迭代。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM