简体   繁体   English

为什么双重GroupBy + ToList花费的时间太长?

[英]Why is double GroupBy + ToList taking too long?

I have these queries : 我有这些查询:

    var Data = (from ftr in db.TB_FTR
                      join mst in db.TB_MST on ftr.MST_ID equals mst.MST_ID
                      join trf in db.TB_TRF on mst.TRF_ID equals trf.ID
                      select new CityCountyType { City = ftr.CITY, County = ftr.COUNTY, Type = trf.TYPE }
                      ).OrderBy(i => i.City).ThenBy(i => i.County);

var Data2 =
    Data.GroupBy(i => new {i.City, i.County, i.Type})
        .Select(group => new {Name = group.Key, Count = group.Count()})
        .OrderBy(x => x.Name)
        .ThenByDescending(x => x.Count)
        .GroupBy(g => new {g.Name.City, g.Name.County})
        .Select(g => g.Select(g2 => 
            new {Name = new {g.Key.City, g.Key.County, g2.Name.Type}, g2.Count})).ToList();

I'm trying to get a list of list of objects whose counties and cities are the same. 我正在尝试获取县和城市相同的对象列表。 But the second query is taking too long to give the result. 但是第二个查询花费的时间太长,无法给出结果。 I waited about 30 minutes but there is no answer, however the list Data has about 5000 records. 我等待了大约30分钟,但没有答案,但是列表Data有大约5000条记录。 How can I change these queries so that I can get list of lists that I want? 如何更改这些查询,以便获得所需列表的列表? Thanks in advance. 提前致谢。

For example this query returns such a list : 例如,此查询返回这样的列表:

{ Name = {{ City = New York City, County  = Bronx, Type = Type A }}, Count = 4 }

{ Name = {{ City = New York City, County  = Bronx, Type = Type B }}, Count = 8 }

{ Name = {{ City = New York City, County  = Bronx, Type = Type C }}, Count = 24 }

{ Name = {{ City = New York City, County  = Manhattan, Type = Type B }}, Count = 43 }

{ Name = {{ City = New York City, County  = Manhattan, Type = Type C }}, Count = 58 }

{ Name = {{ City = Seattle, County  = King County, Type = Type D }}, Count = 43 }

{ Name = {{ City = Seattle, County  = King County, Type = Type A }}, Count = 67 }

{ Name = {{ City = Seattle, County  =    Snohomish County, Type = Type C }}, Count = 67 }

I want to make this list to several lists like this : 我想将此列表添加到几个这样的列表中:

List 1: 清单1:

{ Name = {{ City = New York City, County  = Bronx, Type = Type A }}, Count = 4 }

{ Name = {{ City = New York City, County  = Bronx, Type = Type B }}, Count = 8 }

{ Name = {{ City = New York City, County  = Bronx, Type = Type C }}, Count = 24 }

List 2 : 清单2:

{ Name = {{ City = New York City, County  = Manhattan, Type = Type B }}, Count = 43 }

{ Name = {{ City = New York City, County  = Manhattan, Type = Type C }}, Count = 58 }

List 3 : 清单3:

{ Name = {{ City = Seattle, County  = King County, Type = Type D }}, Count = 43 }

{ Name = {{ City = Seattle, County  = King County, Type = Type A }}, Count = 67 }

List 4: 清单4:

{ Name = {{ City = Seattle, County  =  Snohomish County, Type = Type C }}, Count = 67 }

Possibility 1: Your database isn't indexed to support your query (where and join clauses). 可能性1:您的数据库未编制索引以支持您的查询(where和join子句)。

To find out, get the generated sql and look at the execution plan. 要找出答案,请获取生成的sql并查看执行计划。 If the plan says nested loop join -> clustered index scan, you've found the problem. 如果计划中显示嵌套循环联接->聚集索引扫描,则说明存在问题。

Possibility 2: You've found the n+1 problem. 可能性2:您发现了n + 1问题。

In Linq's GROUP BY , a group consists of group keys and group members. 在Linq的GROUP BY ,一个组由组键和组成员组成。 However in most SQL implementations, GROUP BY gives you the group keys and aggregates. 但是,在大多数SQL实现中, GROUP BY会为您提供组键和聚合。 In order to get the members of a group, a separate query is issued. 为了获得组的成员,将发出一个单独的查询。 If there are n groups, then n queries must be issued (the +1 is the original query). 如果有n个组,则必须发出n个查询(+1是原始查询)。

To find out, get the generated sql. 要找出答案,请获取生成的sql。 If there's a bunch of extra queries being issued and any of them say clustered index scan, you've found the problem. 如果发出了很多额外的查询,并且其中任何一个都显示了聚集索引扫描,那么您已经找到了问题。

Possibility 3: You're actually issuing n^2 (~5,000,000) queries. 情况3:您实际上是在发出n ^ 2(〜5,000,000)条查询。

Well, you grouped twice, so it might be a doubly nested loop. 好吧,您分组了两次,所以它可能是一个双重嵌套循环。 Look at the generated sql and find out. 查看生成的sql并找出答案。


The simplest fix for all of this is to pull the 5,000 records into memory before grouping. 最简单的解决方法是在分组之前将5,000条记录拉入内存。 An easy way to do this is to call ToList before calling GroupBy . 一种简单的方法是在调用GroupBy之前先调用ToList

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM