简体   繁体   English

C#如何回报SelectMany?

[英]C# how to yield return SelectMany?

Let's say I have the following generic combination generator static method: 假设我有以下通用组合生成器静态方法:

public static IEnumerable<IEnumerable<T>> GetAllPossibleCombos<T>(
    IEnumerable<IEnumerable<T>> items)
{
    IEnumerable<IEnumerable<T>> combos = new[] {new T[0]};

    foreach (var inner in items)
        combos = combos.SelectMany(c => inner, (c, i) => c.Append(i));

     return combos;
}

Perhaps I am not understanding this correctly, but doesn't this build the entire combos list in RAM? 也许我没有正确理解这一点,但是这不会构建RAM中的整个组合列表吗? If there are a large number of items the method might cause the computer to run out of RAM. 如果存在大量项目,则该方法可能导致计算机耗尽RAM。

Is there a way to re-write the method to use a yield return on each combo, instead of returning the entire combos set? 有没有办法重新编写方法以在每个组合上使用yield return ,而不是返回整个组合集?

There are some misconceptions in your question, which is awesome because now you have an opportunity to learn facts rather than myths. 你的问题有一些误解,这很棒,因为现在你有机会学习事实而不是神话。


First off, the method you are implementing is usually called CartesianProduct , not GetAllPossibleCombos , so consider renaming it. 首先,您实现的方法通常称为CartesianProduct ,而不是GetAllPossibleCombos ,因此请考虑重命名它。


Perhaps I am not understanding this correctly 也许我没有正确理解这一点

You are not understanding it correctly. 你没有正确理解它。

doesn't this build the entire combos list in RAM? 这不是在RAM中构建整个组合列表吗?

No. A query builder builds a query, not the results of executing the query. 。查询构建器构建查询,而不是执行查询的结果。 When you do a SelectMany , what you get is an object that will do the selection in the future . 当您执行SelectMany ,您获得的是将来将进行选择的对象。 You don't get the results of that selection. 您没有得到该选择的结果。

If there are a large number of items the method might cause the computer to run out of RAM. 如果存在大量项目,则该方法可能导致计算机耗尽RAM。

Today would be a good day to stop thinking of memory and RAM as the same thing. 今天是停止将内存和RAM视为同样的事情的好日子。 When a process runs out of memory, it does not run out of RAM. 当进程耗尽内存时,它不会耗尽RAM。 It runs out of address space , which is not RAM. 它耗尽了地址空间 ,而不是RAM。 The better way to think about memory is: memory is on-disk page file , and RAM is special hardware that makes your page file faster . 考虑内存的更好方法是:内存是磁盘上的页面文件 ,RAM是使页面文件更快的特殊硬件 When you run out of RAM, your machine might get unacceptably slow, but you don't run out of memory until you run out of address space . 当RAM耗尽时,您的计算机可能会慢得令人无法接受,但在地址空间不足之前,您的内存不会耗尽。 Remember, process memory is virtualized . 请记住, 进程内存是虚拟化的

Now, there may be scenarios in which executing this code is inefficient because enumerating the query runs out of stack . 现在, 可能存在执行此代码效率低下的情况,因为枚举查询会耗尽堆栈 And there may be scenarios in which execution becomes inefficient because you're moving n items up a stack n deep. 并且可能存在执行变得低效的情况,因为您将n个项目向上移动到堆栈n深度。 I suggest that you to do a deeper analysis of your code and see if that is the case, and report back. 我建议您对代码进行更深入的分析,看看是否属于这种情况,然后向您报告。


Is there a way to re-write the method to use a yield return on each combo, instead of returning the entire combos set? 有没有办法重新编写方法以在每个组合上使用yield return,而不是返回整个组合集?

SelectMany is implemented as a yield return in a foreach loop, so you've already implemented it as a yield return on each combo; SelectManyforeach循环中实现为yield return ,因此您已经将它实现为每个组合的yield return ; you've just hidden the yield return inside a call to SelectMany . 你刚刚在SelectMany的调用中隐藏了yield return

That is, SelectMany<A, B, C>(IE<A> items, Func<A, IE<B>> f, Func<A, B, C> g) is implemented as something like: 也就是说, SelectMany<A, B, C>(IE<A> items, Func<A, IE<B>> f, Func<A, B, C> g)实现如下:

foreach(A a in items)
  foreach(B b in f(a))
    yield return g(a, b);

So you've already done it in yield return . 所以你已经在yield return做到了。

If you want to write a method that directly does a yield return that's a little harder; 如果你想编写一个直接进行yield return的方法,那就更难了; the easiest way to do that is to form an array of enumerators on each child sequence, then make a vector from each Current of the enumerators, yield return the vector, and then advance the correct iterator one step. 最简单的方法是在每个子序列上形成一个枚举数组,然后从枚举器的每个Current中生成一个向量, yield return向量,然后将正确的迭代器推进一步。 Keep on doing that until there is no longer a correct iterator to advance. 继续这样做,直到不再有正确的迭代器来推进。

As you can probably tell from that description, the bookkeeping gets messy. 正如您可以从该描述中看出的那样,簿记变得混乱。 It is doable, but it's not very pleasant code to write. 这是可行的,但编写代码并不是非常愉快。 Give it a try though! 试试吧! The nice thing about that solution is that you are guaranteed to have good performance because you're not consuming any stack. 该解决方案的优点在于,您可以保证具有良好的性能,因为您不会消耗任何堆栈。

UPDATE: This related question has an answer posted that does an iterative algorithm, but I have not reviewed it to see if it is correct. 更新:这个相关的问题有一个答案张贴,它做了一个迭代算法,但我没有审查它,看它是否正确。 https://stackoverflow.com/a/57683769/88656 https://stackoverflow.com/a/57683769/88656


Finally, I encourage you to compare your implementation to mine: 最后,我鼓励您将您的实施与我的实施进行比较:

https://ericlippert.com/2010/06/28/computing-a-cartesian-product-with-linq/ https://ericlippert.com/2010/06/28/computing-a-cartesian-product-with-linq/

Is my implementation in any way fundamentally different than yours, or are we doing the same thing, just using slightly different syntax? 我的实现是否与您的实现根本不同,或者我们是否正在做同样的事情,只是使用稍微不同的语法? Give that some thought. 给出一些想法。

Also I encourage you to read Ian Griffiths' excellent six-part series on an analysis of various implementations of this function: 另外,我鼓励你阅读Ian Griffiths关于这个函数的各种实现的分析的优秀六部分系列:

http://www.interact-sw.co.uk/iangblog/2010/07/28/linq-cartesian-1 http://www.interact-sw.co.uk/iangblog/2010/07/28/linq-cartesian-1

SelectMany and other Linq methods return an IEnumerable , which is lazily evaluated only when enumerating the collection. SelectMany和其他Linq方法返回一个IEnumerable ,只有在枚举集合时才会延迟评估。 This can be in the form of a ToList() or ToArray() call or iterating over it in a foreach loop. 这可以是ToList()ToArray()调用的形式,也可以在foreach循环中迭代它。 When you see the message in the debugger warning that expanding a collection will enumerate the enumerable, this is the behavior it is warning you about. 当您在调试器警告中看到消息时,扩展集合将枚举可枚举,这是它警告您的行为。 The collection hasn't been enumerated yet - a Linq query only builds up a chain of calls that tells it how to enumerate the data. 该集合尚未枚举 - Linq查询仅构建一系列调用,告诉它如何枚举数据。

So, your concern about RAM usage isn't necessarily accurate (depending on the concrete type of the starting IEnumerable ). 因此,您对RAM使用的担忧不一定准确(取决于起始IEnumerable的具体类型)。 Even if you call ToList() or ToArray() and store a reference to that in a variable, if the collection elements are reference types then it won't be a copy either. 即使你调用ToList()ToArray()并在变量中存储对它的引用,如果集合元素是引用类型,那么它也不是副本。

In your example, yield return gives you convenience if you want to lazily build a collection of elements without storing it in a separate collection (eg a return list or array, which requires additional copying). 在您的示例中,如果您想延迟构建元素集合而不将其存储在单独的集合中(例如,需要额外复制的返回列表或数组),则yield return会为您提供便利。 I don't think it applies to what you're trying to do, since SelectMany has this behavior already. 我不认为它适用于你想要做的事情,因为SelectMany已经有了这种行为。

If you want to try it out, Linq makes it pretty easy to generate large lists with Enumerable.Repeat 如果你想尝试一下,Linq可以很容易地使用Enumerable.Repeat生成大型列表

// Define a collection with 10000000 items (items not created yet)
var manyItems = Enumerable.Repeat(123, 10000000);

// Enumerate the enumerable via ToList: creates the int 10000000 times
var manyItemsConcrete = manyItems.ToList();

// same deal with reference types
var manyReferenceTypes = Enumerable.Repeate(new object(), 10000000);
var manyReferenceTypesConcrete = manyReferenceTypes.ToList();

// This list already exists in RAM taking up space
var list = new List<object> { new object(), new object() /* ... x10000000 */ }
// This defines a transform on list, but doesn't take up RAM
var enumerable = list.Select(x => x.ToString());

// Now, there are two lists taking up RAM
var newList = enumerable.ToList();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM