简体   繁体   English

提高LINQ性能

[英]Improve LINQ performance

I have a linq statement like this: 我有这样的linq语句:

var records = from line in myfile 
              let data = line.Split(',')
              select new { a=int.Parse(data[0]), b=int.Parse(data[1]) };
var average = records.Sum(r => r.b)!=0?records.Sum(r => r.a) / records.Sum(r => r.b):0;

My question is: How many times records.Sum(r => rb) is computed in the last line? 我的问题是:记录的次数是多少次。(r => rb)是在最后一行计算的? Does LINQ loop over all the records each time when it needs to compute a sum (in this case, 3 Sum() so loop 3 times)? LINQ是否每次需要计算总和时都会遍历所有记录(在这种情况下,3 Sum()所以循环3次)? Or does it smartly loop over all the records just once andcompute all the sums? 或者只巧妙地循环所有记录一次并计算所有总和?


Edit 1 : 编辑1

  1. I wonder if there is any way to improve it by only going through all the records just once (as we only need to do it in a single loop when use a plain for loop)? 我想知道是否有任何方法可以通过仅仅浏览一次所有记录来改进它(因为我们只需要在使用plain for循环时在单个循环中执行它)?

  2. And there is really no need to load everything into memory before we can do the sum and average . 在我们完成总和和平均之前,确实没有必要将所有内容加载到内存中 Surely we can sum each element while loading it from the file. 当然,我们可以在从文件加载每个元素时对它们求和。 Is there any way to reduce the memory consumption as well? 有没有办法减少内存消耗?


Edit 2 编辑2

Just to clarify a bit, I didn't use LINQ before I ended up like above. 只是为了澄清一下,在我结束之前我没有使用LINQ。 Using plain while/for loop can achieve all the performance requirements. 使用plain while / for循环可以实现所有性能要求。 But I then tried to improve the readability and also reduce the lines of code by using LINQ. 但我接着尝试通过使用LINQ来提高可读性并减少代码行。 It seems that we can't get both at the same time. 似乎我们无法同时获得两者。

Twice, write it like this and it will be once: 两次,写这样,它将是一次:

var sum = records.Sum(r => r.b);

var avarage = sum != 0 ? records.Sum(r => r.a)/sum: 0;

There are plenty of answers, but none that wrap all of your questions up. 很多答案,但没有一个能够解决你的所有问题。

How many times records.Sum(r => rb) is computed in the last line? 记录的次数是多少次。(r => rb)是在最后一行计算的?

Three times. 三次。

Does LINQ loop over all the records each time when it needs to compute a sum (in this case, 3 Sum() so loop 3 times)? LINQ是否每次需要计算总和时都会遍历所有记录(在这种情况下,3 Sum()所以循环3次)?

Yes. 是。

Or does it smartly loop over all the records just once andcompute all the sums? 或者只巧妙地循环所有记录一次并计算所有总和?

No. 没有。

I wonder if there is any way to improve it by only going through all the records just once (as we only need to do it in a single loop when use a plain for loop)? 我想知道是否有任何方法可以通过仅仅浏览一次所有记录来改进它(因为我们只需要在使用plain for循环时在单个循环中执行它)?

You can do that, but it requires you to eagerly-load all the data which contradicts your next question. 你可以这样做,但它需要你急切地加载所有与你的下一个问题相矛盾的数据。

And there is really no need to load everything into memory before we can do the sum and average. 在我们完成总和和平均之前,确实没有必要将所有内容加载到内存中。 Surely we can sum each element while loading it from the file. 当然,我们可以在从文件加载每个元素时对它们求和。 Is there any way to reduce the memory consumption as well? 有没有办法减少内存消耗?

That's correct. 那是对的。 In your original post you have a variable called myFile and you're iterating over it and putting it into a local variable called line (read: basically a foreach ). 在你的原始帖子中,你有一个名为myFile的变量,你正在迭代它并将它放入一个名为line的局部变量中(读取:基本上是一个foreach )。 Since you didn't show how you got your myFile data, I'm assuming that you're eagerly loading all the data. 由于您没有显示如何获取myFile数据,我假设您正在急切地加载所有数据。

Here's a quick example of lazy-loading your data: 这是一个延迟加载数据的快速示例:

public IEnumerable<string> GetData()
{
    using (var fileStream = File.OpenRead(@"C:\Temp\MyData.txt"))
    {
        using (var streamReader = new StreamReader(fileStream))
        {
            string line;
            while ((line = streamReader.ReadLine()) != null)
            {                       
                yield return line;
            }
        }
    }
}

public void CalculateSumAndAverage()
{
    var sumA = 0;
    var sumB = 0;
    var average = 0;

    foreach (var line in GetData())
    {
        var split = line.Split(',');
        var a = Convert.ToInt32(split[0]);
        var b = Convert.ToInt32(split[1]);

        sumA += a;
        sumB += b;
    }

    // I'm not a big fan of ternary operators,
    // but feel free to convert this if you so desire.
    if (sumB != 0)
    {
        average = sumA / sumB;
    }
    else 
    {
        // This else clause is redundant, but I converted it from a ternary operator.
        average = 0;
    }
}

Three times, and what you should use here is Aggregate , not Sum . 三次,你应该在这里使用的是Aggregate ,而不是Sum

// do your original selection
var records = from line in myfile 
              let data = line.Split(',')
              select new { a=int.Parse(data[0]), b=int.Parse(data[1]) };
// aggregate them into one record
var sumRec = records.Aggregate((runningSum, next) =>
          { 
            runningSum.a += next.a;
            runningSum.b += next.b;                
            return runningSum;
          });
// Calculate your average
var average = sumRec.b != 0 ? sumRec.a / sumRec.b : 0;

Each call to the Sum method iterate through all lines in myfile. 每次调用Sum方法都会遍历myfile中的所有行。 To improve performance write: 为了提高性能写:

var records = (from line in myfile 
          let data = line.Split(',')
          select new { a=int.Parse(data[0]), b=int.Parse(data[1]) }).ToList();

so it would create the list with all elements (with "a" and "b" properties) and then each call to the Sum method will iterate through this list without splitting and parsing data. 所以它会创建包含所有元素的列表(带有“a”和“b”属性),然后每次调用Sum方法都将遍历此列表而不拆分和解析数据。 Of course you can go further and remember the result of the Sum method in some temporary variable. 当然,你可以进一步记住一些临时变量中Sum方法的结果。

james, I'm not an expert at all nut this is my idea. 詹姆斯,我不是一位专家,这是我的想法。 I think it may be reduced to 1. Maybe there is a little bit more code. 我认为可能会减少到1.也许有更多的代码。 records is still an IEnumerable of AnonymousType {int a,int b}. 记录仍然是AnonymousType {int a,int b}的IEnumerable。

*Dynamic was a quick way to solve it. *动态是一种快速解决方法。 You should write an structure for it. 你应该为它编写一个结构。

int sum_a = 0,sum_b = 0;
Func<string[], dynamic> b = (string[] data) => { 
    sum_a += int.Parse(data[0]); 
    sum_b += int.Parse(data[1]);
    return new {a = int.Parse(data[0]),b = int.Parse(data[0]) }; 
};
var records = from line in fileLines 
              let data = line.Split(',')
              let result = b(data)
              select new { a = (int)result.a, b = (int)result.b };
var average = sum_b != 0 ? sum_a / sum_b : 0;

For other structures it's simple. 对于其他结构,它很简单。

public struct Int_Int //May be a class or interface for mapping
{
    public int a = 0, b = 0;        
}

Then 然后

int sum_a = 0,sum_b = 0;    
Func<string[], Int_Int> b = (string[] data) => { 
    sum_a += int.Parse(data[0]); 
    sum_b += int.Parse(data[1]);
    return new Int_Int() { a = int.Parse(data[0]), b = int.Parse(data[0]) }; 
};
var records = from line in fileLines
              let data = line.Split(',')
              select b(data);
var average = sum_b != 0 ? sum_a / sum_b : 0;

SUM gets all records any time that you call it, I recommend you use ToList() --> Do you ToList()? SUM会在您调用它时随时获取所有记录,我建议您使用ToList() - > To ToList()?

var records = from line in myfile 
              let data = line.Split(',')
              select new { a=int.Parse(data[0]), b=int.Parse(data[1]) }.ToList();

var sumb = records.Sum(r => r.b);
var average = sumb !=0?records.Sum(r => r.a) / sumb :0;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM