[英]Optimize/rewrite LINQ query with GROUP BY and COUNT
我正在嘗試按以下數據集計算按名稱分組的唯一Foos和Bars。
Id | IsActive | Name | Foo | Bar
1 | 1 | A | 11 | null
2 | 1 | A | 11 | null
3 | 1 | A | null | 123
4 | 1 | B | null | 321
我希望上面數據的結果是:
Expected:
A = 2;
B = 1;
我嘗試按名稱,Foo,Bar進行分組,然后再按名稱分組,並使用計數來獲取“行”計數。 但那並沒有給我正確的結果。 (或者ToDictionary扔了一個重復的鍵,我玩了很多,所以不記得了)
db.MyEntity
.Where(x => x.IsActive)
.GroupBy(x => new { x.Name, x.Foo, x.Bar })
.GroupBy(x => new { x.Key.Name, Count = x.Count() })
.ToDictionary(x => x.Key, x => x.Count);
所以我提出了這個LINQ查詢。 但它相當慢。
db.MyEntity
.Where(x => x.IsActive)
.GroupBy(x => x.Name)
.ToDictionary(x => x.Key,
x =>
x.Where(y => y.Foo != null).Select(y => y.Foo).Distinct().Count() +
x.Where(y => y.Bar != null).Select(y => y.Bar).Distinct().Count());
我該如何優化它?
這是推薦的實體
public class MyEntity
{
public int Id { get; set; }
public bool IsActive { get; set; }
public string Name { get; set; }
public int? Foo { get; set; }
public int? Bar { get; set; }
}
我也試過這個查詢
db.MyEntity
.Where(x => x.IsActive)
.GroupBy(x => new { x.Name, x.Foo, x.Bar })
.GroupBy(x => x.Key.Name)
.ToDictionary(x => x.Key, x => x.Count());
但那引發了超時異常:(
查詢效率極低,因為您在客戶端執行大部分工作(構建字典所涉及的所有內容),而無法使用數據庫進行投影。 這是一個問題,因為數據庫(特別是如果這些值被索引)可以比客戶端更快地完成這項工作,並且還因為對數據庫進行預測涉及通過網絡發送的數據少得多。
因此, 在對數據進行分組之前,只需進行預測。
var activeItems = db.MyEntity.Where(x => x.IsActive);
var query = activeItems.Select(x => new { Name, Value = x.Foo}).Distinct()
.Concat(activeItems.Select(x => new { Name, Value = x.Bar}).Distinct())
.Where(x => x != null)
.GroupBy(pair => pair.Name)
.Select(group => new { group.Key, Count = Group.Count()})
.ToDictionary(pair => pair.Key, pair => pair.Count);
您的目標是產生以下查詢:
select Name, count(distinct Foo) + count(distinct Bar)
from myEntity
where IsActive = 1
group by Name
這是獲得所需內容的最小查詢。 但是LINQ似乎盡可能地復雜化了所有內容:)
您的目標是在數據庫級別盡可能多地執行此操作。 現在您的查詢被翻譯為:
SELECT
[Project2].[C1] AS [C1],
[Project2].[Name] AS [Name],
[Project2].[C2] AS [C2],
[Project2].[id] AS [id],
[Project2].[IsActive] AS [IsActive],
[Project2].[Name1] AS [Name1],
[Project2].[Foo] AS [Foo],
[Project2].[Bar] AS [Bar]
FROM ( SELECT
[Distinct1].[Name] AS [Name],
1 AS [C1],
[Extent2].[id] AS [id],
[Extent2].[IsActive] AS [IsActive],
[Extent2].[Name] AS [Name1],
[Extent2].[Foo] AS [Foo],
[Extent2].[Bar] AS [Bar],
CASE WHEN ([Extent2].[id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM (SELECT DISTINCT
[Extent1].[Name] AS [Name]
FROM [dbo].[SomeTable] AS [Extent1]
WHERE [Extent1].[IsActive] = 1 ) AS [Distinct1]
LEFT OUTER JOIN [dbo].[SomeTable] AS [Extent2] ON ([Extent2].[IsActive] = 1) AND ([Distinct1].[Name] = [Extent2].[Name])
) AS [Project2]
ORDER BY [Project2].[Name] ASC, [Project2].[C2] ASC
它從數據庫中選擇所有內容並在應用程序層執行分組,這是低效的。
@Servy的查詢:
var activeItems = db.MyEntity.Where(x => x.IsActive);
var query = activeItems.Select(x => new { Name, Value = x.Foo}).Distinct()
.Concat(activeItems.Select(x => new { Name, Value = x.Bar}).Distinct())
.Where(x => x != null)
.GroupBy(pair => pair.Name)
.Select(group => new { group.Key, Count = Group.Count()})
.ToDictionary(pair => pair.Key, pair => pair.Count);
被翻譯成:
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [C2],
[GroupBy1].[A1] AS [C3]
FROM ( SELECT
[UnionAll1].[Name] AS [K1],
COUNT(1) AS [A1]
FROM (SELECT
[Distinct1].[Name] AS [Name]
FROM ( SELECT DISTINCT
[Extent1].[Name] AS [Name],
[Extent1].[Foo] AS [Foo]
FROM [dbo].[SomeTable] AS [Extent1]
WHERE ([Extent1].[IsActive] = 1) AND ([Extent1].[Foo] IS NOT NULL)
) AS [Distinct1]
UNION ALL
SELECT
[Distinct2].[Name] AS [Name]
FROM ( SELECT DISTINCT
[Extent2].[Name] AS [Name],
[Extent2].[Bar] AS [Bar]
FROM [dbo].[SomeTable] AS [Extent2]
WHERE ([Extent2].[IsActive] = 1) AND ([Extent2].[Bar] IS NOT NULL)
) AS [Distinct2]) AS [UnionAll1]
GROUP BY [UnionAll1].[Name]
) AS [GroupBy1]
它好多了。
我嘗試過以下方法:
var activeItems = (from o in db.SomeTables
where o.IsActive
group o by o.Name into gr
select new { gr.Key, cc = gr.Select(c => c.Foo).Distinct().Count(c => c != null) +
gr.Select(c => c.Bar).Distinct().Count(c => c != null) }).ToDictionary(c => c.Key);
這被翻譯為:
SELECT
1 AS [C1],
[Project5].[Name] AS [Name],
[Project5].[C1] + [Project5].[C2] AS [C2]
FROM ( SELECT
[Project3].[Name] AS [Name],
[Project3].[C1] AS [C1],
(SELECT
COUNT(1) AS [A1]
FROM ( SELECT DISTINCT
[Extent3].[Bar] AS [Bar]
FROM [dbo].[SomeTable] AS [Extent3]
WHERE ([Extent3].[IsActive] = 1) AND ([Project3].[Name] = [Extent3].[Name]) AND ([Extent3].[Bar] IS NOT NULL)
) AS [Distinct3]) AS [C2]
FROM ( SELECT
[Distinct1].[Name] AS [Name],
(SELECT
COUNT(1) AS [A1]
FROM ( SELECT DISTINCT
[Extent2].[Foo] AS [Foo]
FROM [dbo].[SomeTable] AS [Extent2]
WHERE ([Extent2].[IsActive] = 1) AND ([Distinct1].[Name] = [Extent2].[Name]) AND ([Extent2].[Foo] IS NOT NULL)
) AS [Distinct2]) AS [C1]
FROM ( SELECT DISTINCT
[Extent1].[Name] AS [Name]
FROM [dbo].[SomeTable] AS [Extent1]
WHERE [Extent1].[IsActive] = 1
) AS [Distinct1]
) AS [Project3]
) AS [Project5]
大致相同但沒有第二版的工會。
結論:
如果表非常大並且性能至關重要,我會創建一個視圖並將其導入模型中。 否則堅持@Servy的第3版或第2版。 當然應該測試性能。
我認為您可以稍微修改您的初始查詢以獲得您想要的內容:
db.MyEntity
.Where(x => x.IsActive)
.GroupBy(x => new { x.Name, x.Foo, x.Bar })
.GroupBy(x => x.Key.Name)
.ToDictionary(x => x.Key, x => x.Count());
將Count()
添加到第二個分組時,您將計算三部分鍵的重復值。 您只想計算每個三部分鍵的不同值,因此您可以在按Name
分組后進行計數。
只有關於問題的建議才能使用DISTINCT以獲得更好的性能。使用分組。
請查看此鏈接
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.