简体   繁体   English

LINQ GroupBy 极慢

[英]LINQ GroupBy extremely slow

The following code takes 5 minutes to run on 100,000 rows.以下代码需要 5 分钟才能在 100,000 行上运行。 That seems crazy to me.这对我来说似乎很疯狂。 What am I doing wrong?我究竟做错了什么?

        var query =
            from foo in fooStuff.AsEnumerable()
            group foo by foo.Field<Int64>("FooID") into g
            select new
            {
                    FooID = g.Key,
                    FooTier = g.Min(foo => foo.Field<int>("Tier"))
            };

Note: On Mono.注意:在 Mono 上。

You are materializing all the entities when you call AsEnumerable(), so your grouping is being done in memory.当您调用 AsEnumerable() 时,您正在实现所有实体,因此您的分组是在 memory 中完成的。 Try removing that part so that the grouping is done at the database level:尝试删除该部分,以便在数据库级别完成分组:

var query =
        from foo in fooStuff
        group foo by foo.FooID into g
        select new
        {
                FooID = g.Key,
                FooTier = g.Min(foo => foo.Tier)
        };

It is not a direct comparision and isn't on Mono, but I have some code which does something similar with a 6MB xml file which I read into a DataSet and it has 30,000 rows and takes 0.5 seconds, so I don't think it is the groupby itself that causes the problem.这不是直接比较,也不是在 Mono 上,但我有一些代码与 6MB xml 文件类似,我读入数据集,它有 30,000 行,需要 0.5 秒,所以我不认为是导致问题的 groupby 本身。

To diagnose further, I would suggest为了进一步诊断,我建议

  • Testing how long it takes to read the information into a list, ie测试将信息读入列表需要多长时间,即

    var fooList = fooStuff.AsEnumerable().ToList();
  • Testing how long it takes if you change the query to use fooList instead of fooStuff测试如果将查询更改为使用 fooList 而不是 fooStuff 需要多长时间

  • Testing how long it takes if you remove FooTier = g.Min(foo => foo.Tier) from the select测试如果从 select 中删除 FooTier = g.Min(foo => foo.Tier) 需要多长时间

  • Separate the.Field<> reflection from the groupby and time each section, ie first read the information from the datatable into a list, eg将.Field<> 反射从groupby 中分离出来并对每个部分进行计时,即首先将数据表中的信息读入一个列表中,例如

    var list2 = (from foo in fooStuff.AsEnumerable() select new { FooID = foo.Field<Int64>("FooID") Tier = foo.Field<int>("Tier") }).ToList();

    Then you can query this list然后你可以查询这个列表

    var query = from foo in list2 group foo by foo.FooID into g select new { FooID = g.Key, FooTier = g.Min(foo => foo.Tier) }; var results = query.ToList();

If this query is slow, it would suggest that there is something wrong with mono's implementation of GroupBy.如果这个查询很慢,则表明 mono 的 GroupBy 实现有问题。 You might be able to verify that by using something like this您也许可以通过使用类似这样的东西来验证这一点

    public static Dictionary<TKey, List<TSrc>> TestGroupBy<TSrc, TKey>
     (this IEnumerable<TSrc> src, Func<TSrc,TKey> groupFunc)
    {
        var dict= new Dictionary<TKey, List<TSrc>>();

        foreach (TSrc s in src)
        {
            TKey key = groupFunc(s);
            List<TSrc> list ;

            if (!dict.TryGetValue(key, out list))
            {
                list = new List<TSrc>();
                dict.Add(key, list);
            }       
            list.Add(s);        
            }

        return dict;
}

And to use it并使用它

  var results = list2.TestGroupBy(r=>r.FooID)
      .Select(r=>  new { FooID = r.Key, FooTier = r.Value.Min(r1=>r1.Tier)} );

Note, this is not meant as a replacement for groupby and does not cope with null keys but should be enough to determine if their is a problem with groupby (assuming mono's implementation of Dictionary and List are ok).请注意,这并不意味着替代 groupby 并且不处理 null 键,但应该足以确定它们是否是 groupby 的问题(假设 mono 的 Dictionary 和 List 实现是好的)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM