简体   繁体   中英

LINQ - filtering, grouping and getting Min and Max value

Let's say that I have an EF entity class that represents some value in time:

public class Point
{
    public DateTime DT {get; set;}
    public decimal Value {get; set;}
}

I have also a class that represents some time period:

public class Period
{
    public DateTime Begin {get; set;}
    public DateTime End {get; set;}
}

Then I have an array of Period 's that can contain some specific time slices, let's say that it looks like ( Period objects are always in ascending order in the array):

var periodSlices = new Period [] 
{
    new Period { Begin = new DateTime(2016, 10, 1), End = new DateTime(2016, 10, 15)},
    new Period { Begin = new DateTime(2016, 10, 16), End = new DateTime(2016, 10, 20)},
    new Period { Begin = new DateTime(2016, 10, 21), End = new DateTime(2016, 12, 30)}
};

Now, using LINQ to SQL how to write a query which would filter out and group the Point 's with oldest(min) and latest(max) values within each of periodSlices , so in this example scenario a results should have a 3 groups with min and max points (if any of course).

So what I need to have as a result is something like IQueryable<Period, IEnumerable<Point>> .

Right now I am doing it this way, but the performance is not the greatest:

using (var context = new EfDbContext())
{
    var periodBegin = periodSlices[0].Begin;
    var periodEnd = periodSlices[periodSlices.Length - 1].End;

     var dbPoints = context.Points.Where(p => p.DT >= periodBegin && p.DT <= periodEnd).ToArray();

    foreach (var slice in periodSlices)
    {
        var points = dbPoints.Where(p => p.DT >= slice.Begin && p.DT <= slice.End);

        if (points.Any())
        {
            var latestValue = points.MaxBy(u => u.DT).Value;
            var earliestValue = points.MinBy(u => u.DT).Value;
        }
    }   
}

Performance is crucial (the faster the better as I need to filter out and group ~100k of points).

Here is a single SQL query solution:

var baseQueries = periodSlices
    .Select(slice => db.Points
        .Select(p => new { Period = new Period { Begin = slice.Begin, End = slice.End }, p.DT })
        .Where(p => p.DT >= p.Period.Begin && p.DT <= p.Period.End)
    );

var unionQuery = baseQueries
    .Aggregate(Queryable.Concat);

var periodQuery = unionQuery
    .GroupBy(p => p.Period)
    .Select(g => new
    {
        Period = g.Key,
        MinDT = g.Min(p => p.DT),
        MaxDT = g.Max(p => p.DT),
    });

var finalQuery =
    from p in periodQuery
    join pMin in db.Points on p.MinDT equals pMin.DT
    join pMax in db.Points on p.MaxDT equals pMax.DT
    select new
    {
        Period = p.Period,
        EarliestPoint = pMin,
        LatestPoint = pMax,
    };

I've separated the LINQ query parts into separate variables just for readability. To get the result, only the final query should be executed:

var result = finalQuery.ToList();

Basically we build a UNION ALL query for each slice, then determine the minimum and maximum dates fro each period, and finally get the corresponding values for these date. I've used join instead of the "typical" OrderBy(Descending) + FirstOrDefault() inside the grouping because the later generates terrible SQL.

Now, the main question. I can't say if this would be faster than the original approach - it depends on whether the DT column is indexed and the count of periodSlices because each slice adds another UNION ALL SELECT from source table in the query, which for 3 slices looks like this

SELECT
    [GroupBy1].[K1] AS [C1],
    [GroupBy1].[K2] AS [C2],
    [GroupBy1].[K3] AS [C3],
    [Extent4].[DT] AS [DT],
    [Extent4].[Value] AS [Value],
    [Extent5].[DT] AS [DT1],
    [Extent5].[Value] AS [Value1]
    FROM    (SELECT
        [UnionAll2].[C1] AS [K1],
        [UnionAll2].[C2] AS [K2],
        [UnionAll2].[C3] AS [K3],
        MIN([UnionAll2].[DT]) AS [A1],
        MAX([UnionAll2].[DT]) AS [A2]
        FROM  (SELECT
            1 AS [C1],
            @p__linq__0 AS [C2],
            @p__linq__1 AS [C3],
            [Extent1].[DT] AS [DT]
            FROM [dbo].[Point] AS [Extent1]
            WHERE ([Extent1].[DT] >= @p__linq__0) AND ([Extent1].[DT] <= @p__linq__1)
        UNION ALL
            SELECT
            1 AS [C1],
            @p__linq__2 AS [C2],
            @p__linq__3 AS [C3],
            [Extent2].[DT] AS [DT]
            FROM [dbo].[Point] AS [Extent2]
            WHERE ([Extent2].[DT] >= @p__linq__2) AND ([Extent2].[DT] <= @p__linq__3)
        UNION ALL
            SELECT
            1 AS [C1],
            @p__linq__4 AS [C2],
            @p__linq__5 AS [C3],
            [Extent3].[DT] AS [DT]
            FROM [dbo].[Point] AS [Extent3]
            WHERE ([Extent3].[DT] >= @p__linq__4) AND ([Extent3].[DT] <= @p__linq__5)) AS [UnionAll2]
        GROUP BY [UnionAll2].[C1], [UnionAll2].[C2], [UnionAll2].[C3] ) AS [GroupBy1]
    INNER JOIN [dbo].[Point] AS [Extent4] ON [GroupBy1].[A1] = [Extent4].[DT]
    INNER JOIN [dbo].[Point] AS [Extent5] ON [GroupBy1].[A2] = [Extent5].[DT]

If you want to get the earliest (min) and latest (max) point in each time slice, the first thing I would look at is getting the database to do more.

When you call .ToArray() it brings all the selected points into memory. This is pointless as you only want 2 per slice. So if you did somehting like:

foreach (var slice in periodSlices)
{
    var q = context
                .Points
                .Where(p => p.DT >= slice.Begin && p.DT <= slice.End)
                .OrderBy(x => x.DT);
    var min = q.FirstOrDefault();
    var max = q.LastOrDefault();
}

It might work better

I say might because it depends on what indexes there are on the database and how many points are in each slice. Ultimately to get really good performance you may have to add an index on the datetime, or, change the structure so the min and max are pre-stored, or do it in a stored proc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM