简体   繁体   English

为什么LINQ不缓存枚举?

[英]Why does LINQ not cache enumerations?

So it is my understanding that LINQ does not execute everything immediately, it simply stores information to get at the data. 因此我理解LINQ不会立即执行所有操作,它只是存储信息以获取数据。 So if you do a Where , nothing actually happens to the list, you just get an IEnumerable that has the information it needs to become the list. 因此,如果您执行Where ,实际上没有任何事情发生在列表中,您只需获得一个IEnumerable ,它具有成为列表所需的信息。

One can 'collapse' this information to an actual list by calling ToList . 通过调用ToList可以将此信息“折叠”到实际列表中。

Now I am wondering, why would the LINQ team implement it like this? 现在我想知道,为什么LINQ团队会像这样实现它? It is pretty easy to add a List at each step (or a Dictionary ) to cache the results that have already been calculated, so I guess there must be a good reason. 在每个步骤(或Dictionary )添加一个List来缓存已经计算过的结果非常容易,所以我想必须有一个很好的理由。

This can be checked by this code: 这可以通过以下代码检查:

var list = Enumerable.Range(1, 10).Where(i => {
    Console.WriteLine("Enumerating: " + i);
    return true;
});

var list2 = list.All(i => {
    return true;
});

var list3 = list.Any(i => {
    return false;
});

If the cache were there, it would only output the Enumerating: i once for each number, it would get the items from the cache the second time. 如果缓存在那里,它只会输出Enumerating: i对于每个数字,它将第二次从缓存中获取项目。

Edit: Additional question, why does LINQ not include a cache option? 编辑:其他问题,为什么LINQ不包含缓存选项? Like .Cache() to cache the result of the previous enumerable? .Cache()一样缓存前一个可枚举的结果?

It is pretty easy to add a List at each step 在每一步添加List非常容易

Yes, and very memory intensive. 是的,内存密集。 What if the data set contains 2 GB of data in total, and you have to store that in memory at once. 如果数据集总共包含2 GB数据,并且您必须立即将其存储在内存中,该怎么办? If you iterate over it and fetch it in parts, you don't have a lot of memory pressure. 如果你迭代它并分批获取它,你就没有很大的内存压力。 When serializing 2 GB to memory you do, not to imagine what happens if every step will do the same... 当您将2 GB序列化为内存时,不要想象如果每个步骤都会这样做会发生什么......

You know your code and your specific use case, so only you as a developer can determine when it is useful to split off some iterations to memory. 您知道您的代码和您的特定用例,因此只有您作为开发人员才能确定何时将一些迭代拆分到内存是有用的。 The framework can't know that. 框架无法知道。

Because it makes no sense, and if you would think about all the cases where it makes no sense you would not ask it. 因为它没有意义,如果你想到所有没有意义的情况,你就不会问它。 This is not so much a "does it sometimes make sense" question as a "are there side effects that make it bad". 这不是一个“它有时是否有意义”的问题,因为“有副作用使它变坏”。 Next time you evaluate something like this, think about the negatives: 下次评估这样的事情时,请考虑否定因素:

  • Memory consumption goes up as you HAVE to cache the results, even if not wanted. 即使不想要,也需要缓存结果,因此内存消耗会增加。
  • On then ext run, the results may be different as incoming data may have changed. 然后在ext运行时,结果可能会有所不同,因为传入的数据可能已更改。 your simplistic example (Enumerable.Range) has no issue with that - but filtering a list of customers may have them updated. 您的简单示例(Enumerable.Range)没有问题 - 但过滤客户列表可能会更新它们。

Stuff like that makes is very hard to sensibly take away the choice from the developer. 这样的东西很难明智地从开发者那里拿走选择。 Want a buffer, make one (easily). 想要一个缓冲区,做一个(轻松)。 But the side effects would be bad. 但副作用会很糟糕。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM