简体   繁体   English

大数据集上的EF6聚合

[英]EF6 Aggregation on large data sets

There are two tables, Events and Octave: 有两个表,事件和八度:

+---------+-------+
| EventId | Time  |
+---------+-------+

+----------+---------+-----------+-------+
| OctaveId | EventId | Frequency | Value |
+----------+---------+-----------+-------+

On average there are 10 Octaves for each Event, and an Event is recorded every every 10 seconds, right now there are around 400k events and 4 million octaves. 每个事件平均有10个八度,每10秒记录一次事件,现在有大约400k事件和400万个八度音阶。 I want to filter the events in a specific time range, aggregate them by hour and return for each the average of the Octaves that have the same Frequency. 我想过滤特定时间范围内的事件,按小时汇总它们,并返回每个具有相同频率的八度值的平均值。 The EF6 LINQ code I'm using is: 我正在使用的EF6 LINQ代码是:

_context.Events
      .Where(x => x.Time >= afterDate)
      .Where(x => x.Time <= beforeDate)
      .Select(x => new { year = x.Time.Year, month = x.Time.Month, day = x.Time.Day, hour = x.Time.Hour, data = x.Data })
      .GroupBy(x => new { year = x.year, month = x.month, day = x.day, hour = x.hour })
      .Where(x => x.Any())
      .Select(x => new
      {
         Time = DbFunctions.CreateDateTime(x.Key.year, x.Key.month, x.Key.day, x.Key.hour, 0, 0),
         Data = x.SelectMany(y => y.data).GroupBy(y => new { frequency = y.Frequency }).Select(y => new
         {
            frequency  = y.Key.frequency,
            value = Math.Round(y.Average(z => z.Value), 1),
         })

      })
        .OrderByDescending(m => m.Time)
        .Take(limit);

Which works, but only when the time span is very little (some hours). 哪个有效,但只在时间跨度非常小(几个小时)时才有效。 If it's increased to some days the query seems to run forever. 如果它增加到几天,查询似乎永远运行。 Am I asking too much to SQL Server? 我对SQL Server要求太多了吗? Or is there a better way to run this query/structure my data? 或者有更好的方法来运行此查询/结构我的数据? If I remove the SelectMany(...).GroupBy(...) then it's not crazy slow anymore. 如果我删除了SelectMany(...)。GroupBy(...)那么它就不再那么疯狂了。

The SQL query generated is: 生成的SQL查询是:

SELECT 
    [Project5].[C1] AS [C1], 
    [Project5].[C2] AS [C2], 
    [Project5].[C3] AS [C3], 
    [Project5].[C4] AS [C4], 
    [Project5].[C5] AS [C5], 
    [Project5].[C6] AS [C6], 
    [Project5].[C8] AS [C7], 
    [Project5].[Frequency] AS [Frequency], 
    [Project5].[C7] AS [C8]
    FROM ( SELECT 
        [Limit1].[C1] AS [C1], 
        [Limit1].[C2] AS [C2], 
        [Limit1].[C3] AS [C3], 
        [Limit1].[C4] AS [C4], 
        [Limit1].[C5] AS [C5], 
        [Limit1].[C6] AS [C6], 
        CASE WHEN ([GroupBy1].[K1] IS NULL) THEN CAST(NULL AS float) ELSE ROUND([GroupBy1].[A1], 1) END AS [C7], 
        [GroupBy1].[K1] AS [Frequency], 
        CASE WHEN ([GroupBy1].[K1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C8]
        FROM   (SELECT TOP (10000) [Project4].[C1] AS [C1], [Project4].[C2] AS [C2], [Project4].[C3] AS [C3], [Project4].[C4] AS [C4], [Project4].[C5] AS [C5], [Project4].[C6] AS [C6]
            FROM ( SELECT 
                [Project2].[C1] AS [C1], 
                [Project2].[C2] AS [C2], 
                [Project2].[C3] AS [C3], 
                [Project2].[C4] AS [C4], 
                1 AS [C5], 
                convert (datetime2,right('000' + convert(varchar(255), [Project2].[C1]), 4) + '-' + convert(varchar(255), [Project2].[C2]) + '-' + convert(varchar(255), [Project2].[C3]) + ' ' + convert(varchar(255), [Project2].[C4]) + ':' + convert(varchar(255), 0) + ':' + str(cast(0 as float(53)), 10, 7), 121) AS [C6]
                FROM ( SELECT 
                    [Distinct1].[C1] AS [C1], 
                    [Distinct1].[C2] AS [C2], 
                    [Distinct1].[C3] AS [C3], 
                    [Distinct1].[C4] AS [C4]
                    FROM ( SELECT DISTINCT 
                        DATEPART (year, [Extent1].[TimeEnd]) AS [C1], 
                        DATEPART (month, [Extent1].[TimeEnd]) AS [C2], 
                        DATEPART (day, [Extent1].[TimeEnd]) AS [C3], 
                        DATEPART (hour, [Extent1].[TimeEnd]) AS [C4]
                        FROM [dbo].[Events] AS [Extent1]
                        WHERE ([Extent1].[TimeEnd] >= @p__linq__1) AND ([Extent1].[TimeEnd] <= @p__linq__2)
                    )  AS [Distinct1]
                )  AS [Project2]
                WHERE  EXISTS (SELECT 
                    1 AS [C1]
                    FROM [dbo].[Events] AS [Extent2]
                    WHERE ([Extent2].[TimeEnd] >= @p__linq__1) AND ([Extent2].[TimeEnd] <= @p__linq__2) AND (([Project2].[C1] = (DATEPART (year, [Extent2].[TimeEnd]))) OR (([Project2].[C1] IS NULL) AND (DATEPART (year, [Extent2].[TimeEnd]) IS NULL))) AND (([Project2].[C2] = (DATEPART (month, [Extent2].[TimeEnd]))) OR (([Project2].[C2] IS NULL) AND (DATEPART (month, [Extent2].[TimeEnd]) IS NULL))) AND (([Project2].[C3] = (DATEPART (day, [Extent2].[TimeEnd]))) OR (([Project2].[C3] IS NULL) AND (DATEPART (day, [Extent2].[TimeEnd]) IS NULL))) AND (([Project2].[C4] = (DATEPART (hour, [Extent2].[TimeEnd]))) OR (([Project2].[C4] IS NULL) AND (DATEPART (hour, [Extent2].[TimeEnd]) IS NULL)))
                )
            )  AS [Project4]
            ORDER BY [Project4].[C6] DESC ) AS [Limit1]
        OUTER APPLY  (SELECT 
            [Extent4].[Frequency] AS [K1], 
            AVG([Extent4].[Value]) AS [A1]
            FROM  [dbo].[Events] AS [Extent3]
            INNER JOIN [dbo].[Octaves] AS [Extent4] ON [Extent3].[EventId] = [Extent4].[EventId]
            WHERE ([Extent3].[TimeEnd] >= @p__linq__1) AND ([Extent3].[TimeEnd] <= @p__linq__2) AND (([Limit1].[C1] = (DATEPART (year, [Extent3].[TimeEnd]))) OR (([Limit1].[C1] IS NULL) AND (DATEPART (year, [Extent3].[TimeEnd]) IS NULL))) AND (([Limit1].[C2] = (DATEPART (month, [Extent3].[TimeEnd]))) OR (([Limit1].[C2] IS NULL) AND (DATEPART (month, [Extent3].[TimeEnd]) IS NULL))) AND (([Limit1].[C3] = (DATEPART (day, [Extent3].[TimeEnd]))) OR (([Limit1].[C3] IS NULL) AND (DATEPART (day, [Extent3].[TimeEnd]) IS NULL))) AND (([Limit1].[C4] = (DATEPART (hour, [Extent3].[TimeEnd]))) OR (([Limit1].[C4] IS NULL) AND (DATEPART (hour, [Extent3].[TimeEnd]) IS NULL)))
            GROUP BY [Extent4].[Frequency] ) AS [GroupBy1]
    )  AS [Project5]
    ORDER BY [Project5].[C6] DESC, [Project5].[C1] ASC, [Project5].[C2] ASC, [Project5].[C3] ASC, [Project5].[C4] ASC, [Project5].[C8] ASC

UPDATE 1 更新1

I've tried to 'flip' the query, by querying the octaves directly and I'm having better results. 我试图'翻转'查询,通过直接查询八度音,我有更好的结果。 I first group them by date and frequency, calculate the average, then I group them again just by time. 我首先按日期和频率对它们进行分组,计算平均值,然后我再按时间对它们进行分组。 It's not elegant at all, but it's the first solution to actually work. 它根本不优雅,但它是实际工作的第一个解决方案。 If the grouping is done differently (eg first by time, then by frequency, then averaged) it still won't work. 如果分组以不同方式完成(例如,首先按时间,然后按频率,然后按平均),它仍然不起作用。

 _context.Octaves
.Where(x => x.Event.Time >= afterDate)
.Where(x => x.Event.Time <= beforeDate)
.GroupBy(x => new { year = x.Event.Time.Year, month = x.Event.Time.Month, day = x.Event.Time.Day, hour = x.Event.Time.Hour, freq = x.Frequency })
.Select(x => new
{
  year = x.Key.year,
  month = x.Key.month,
  day = x.Key.day,
  hour = x.Key.hour,
  freq = x.Key.freq,
  value = Math.Round(x.Average(y => y.Value), 1)

})
.GroupBy(x => new { year = x.year, month = x.month, day = x.day, hour = x.hour })
.Select(x => new
{
  timeEnd = DbFunctions.CreateDateTime(x.Key.year, x.Key.month, x.Key.day, x.Key.hour, 0, 0),
  data = x.Select(y=> new {freq = y.freq, value = y.value })

})
.OrderByDescending(m => m.timeEnd)
.Take(limit)

I am not sure, but you might want to try this. 我不确定,但你可能想试试这个。 it might be worse i am not sure. 可能会更糟,我不确定。

_context.Events.AsNoTracking()
  .Where(x => x.Time >= afterDate &&  x.Time <= beforeDate)
.GroupBy(x => new { year = x.year, month = x.month, day = x.day, hour = x.hour })
.Select(x => new
               {Time = DbFunctions.CreateDateTime(x.Key.year, x.Key.month, x.Key.day, x.Key.hour, 0, 0),
                   Data = x.SelectMany
                   (y => 
                        y.Select(h => 
                        h.data.GroupBy(y => y.Frequency).select(y => 
                                new {
                                        frequency = y.Key,
                                        value = Math.Round(y.Average(z => z.Value), 1)
                                    }
 ))))
    .OrderByDescending(m => m.Time)
    .Take(limit);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM