![](/img/trans.png)
[英]LINQ: Select the Min and Max values from a collection on an entity after grouping
[英]LINQ - filtering, grouping and getting Min and Max value
假設我有一個EF實體類,它表示時間上的一些值:
public class Point
{
public DateTime DT {get; set;}
public decimal Value {get; set;}
}
我也有一個代表一段時間的課程:
public class Period
{
public DateTime Begin {get; set;}
public DateTime End {get; set;}
}
然后,我有一個Period
數組,其中可以包含一些特定的時間片,讓我們說它看起來像( Period
對象在數組中始終按升序排列):
var periodSlices = new Period []
{
new Period { Begin = new DateTime(2016, 10, 1), End = new DateTime(2016, 10, 15)},
new Period { Begin = new DateTime(2016, 10, 16), End = new DateTime(2016, 10, 20)},
new Period { Begin = new DateTime(2016, 10, 21), End = new DateTime(2016, 12, 30)}
};
現在,使用LINQ to SQL如何編寫一個查詢,該查詢將在每個periodSlices
以最舊(最小)和最新(最大)值對Point
過濾並將其分組,因此在此示例場景中,結果應具有3組,最小和最大點(如果有的話)。
因此,我需要的是類似IQueryable<Period, IEnumerable<Point>>
。
現在,我正在這樣做,但是性能並不是最好的:
using (var context = new EfDbContext())
{
var periodBegin = periodSlices[0].Begin;
var periodEnd = periodSlices[periodSlices.Length - 1].End;
var dbPoints = context.Points.Where(p => p.DT >= periodBegin && p.DT <= periodEnd).ToArray();
foreach (var slice in periodSlices)
{
var points = dbPoints.Where(p => p.DT >= slice.Begin && p.DT <= slice.End);
if (points.Any())
{
var latestValue = points.MaxBy(u => u.DT).Value;
var earliestValue = points.MinBy(u => u.DT).Value;
}
}
}
性能至關重要(速度越快越好,因為我需要過濾並分組約100k點)。
這是一個SQL查詢解決方案:
var baseQueries = periodSlices
.Select(slice => db.Points
.Select(p => new { Period = new Period { Begin = slice.Begin, End = slice.End }, p.DT })
.Where(p => p.DT >= p.Period.Begin && p.DT <= p.Period.End)
);
var unionQuery = baseQueries
.Aggregate(Queryable.Concat);
var periodQuery = unionQuery
.GroupBy(p => p.Period)
.Select(g => new
{
Period = g.Key,
MinDT = g.Min(p => p.DT),
MaxDT = g.Max(p => p.DT),
});
var finalQuery =
from p in periodQuery
join pMin in db.Points on p.MinDT equals pMin.DT
join pMax in db.Points on p.MaxDT equals pMax.DT
select new
{
Period = p.Period,
EarliestPoint = pMin,
LatestPoint = pMax,
};
為了方便閱讀,我將LINQ查詢部分分成了單獨的變量。 要獲得結果,僅應執行最終查詢:
var result = finalQuery.ToList();
基本上,我們為每個切片建立UNION ALL
查詢,然后確定每個期間的最小和最大日期,最后獲得這些日期的相應值。 我在分組內部使用了join
而不是“典型的” OrderBy(Descending)
+ FirstOrDefault()
,因為后者會生成可怕的SQL。
現在,主要問題。 我不能說這是否會比原始方法快-它取決於DT
列是否已索引以及periodSlices
的計數,因為每個切片在查詢中從源表中添加了另一個UNION ALL SELECT
,對於3個切片來說這個
SELECT
[GroupBy1].[K1] AS [C1],
[GroupBy1].[K2] AS [C2],
[GroupBy1].[K3] AS [C3],
[Extent4].[DT] AS [DT],
[Extent4].[Value] AS [Value],
[Extent5].[DT] AS [DT1],
[Extent5].[Value] AS [Value1]
FROM (SELECT
[UnionAll2].[C1] AS [K1],
[UnionAll2].[C2] AS [K2],
[UnionAll2].[C3] AS [K3],
MIN([UnionAll2].[DT]) AS [A1],
MAX([UnionAll2].[DT]) AS [A2]
FROM (SELECT
1 AS [C1],
@p__linq__0 AS [C2],
@p__linq__1 AS [C3],
[Extent1].[DT] AS [DT]
FROM [dbo].[Point] AS [Extent1]
WHERE ([Extent1].[DT] >= @p__linq__0) AND ([Extent1].[DT] <= @p__linq__1)
UNION ALL
SELECT
1 AS [C1],
@p__linq__2 AS [C2],
@p__linq__3 AS [C3],
[Extent2].[DT] AS [DT]
FROM [dbo].[Point] AS [Extent2]
WHERE ([Extent2].[DT] >= @p__linq__2) AND ([Extent2].[DT] <= @p__linq__3)
UNION ALL
SELECT
1 AS [C1],
@p__linq__4 AS [C2],
@p__linq__5 AS [C3],
[Extent3].[DT] AS [DT]
FROM [dbo].[Point] AS [Extent3]
WHERE ([Extent3].[DT] >= @p__linq__4) AND ([Extent3].[DT] <= @p__linq__5)) AS [UnionAll2]
GROUP BY [UnionAll2].[C1], [UnionAll2].[C2], [UnionAll2].[C3] ) AS [GroupBy1]
INNER JOIN [dbo].[Point] AS [Extent4] ON [GroupBy1].[A1] = [Extent4].[DT]
INNER JOIN [dbo].[Point] AS [Extent5] ON [GroupBy1].[A2] = [Extent5].[DT]
如果要在每個時間片中獲得最早的(最小)和最新的(最大)點,那么我要看的第一件事就是讓數據庫做更多的事情。
調用.ToArray()時,它將所有選定的點都帶入內存。 這是沒有意義的,因為每個切片只需要2個。 因此,如果您這樣做:
foreach (var slice in periodSlices)
{
var q = context
.Points
.Where(p => p.DT >= slice.Begin && p.DT <= slice.End)
.OrderBy(x => x.DT);
var min = q.FirstOrDefault();
var max = q.LastOrDefault();
}
可能會更好
我說的威力 ,因為這要看是什么指標有在數據庫上多少點在每個切片。 最終要獲得真正好的性能,您可能必須在日期時間上添加索引,或者更改結構以使min和max預先存儲,或者在存儲的proc中進行。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.