简体   繁体   English

SQL聚合OVER和PARTITION

[英]SQL Aggregates OVER and PARTITION

All, 所有,

This is my first post on Stackoverflow, so go easy... 这是我关于Stackoverflow的第一篇文章,所以轻松吧...

I am using SQL Server 2008. 我正在使用SQL Server 2008。

I am fairly new to writing SQL queries, and I have a problem that I thought was pretty simple, but I've been fighting for 2 days. 我刚开始编写SQL查询,但遇到一个我认为很简单的问题,但是我已经奋斗了2天。 I have a set of data that looks like this: 我有一组看起来像这样的数据:

UserId          Duration(Seconds)        Month
1               45                       January
1               90                       January
1               50                       February
1               42                       February
2               80                       January
2               110                      February
3               45                       January
3               62                       January
3               56                       January
3               60                       February

Now, what I want is to write a single query that gives me the average for a particular user and compares it against all user's average for that month. 现在,我要编写一个查询,为我提供特定用户的平均值,并将其与该月所有用户的平均值进行比较。 So the resulting dataset after a query for user #1 would look like this: 因此,查询用户#1后的结果数据集将如下所示:

UserId         Duration(seconds)        OrganizationDuration(Seconds)        Month
1              67.5                     63                                   January
1              46                       65.5                                 February

I've been batting around different subqueries and group by scenarios and nothing ever seems to work. 我一直在围绕不同的子查询和按场景分组,似乎没有任何工作。 Lately, I've been trying OVER and PARTITION BY, but with no success there either. 最近,我一直在尝试OVER和PARTITION BY,但也没有成功。 My latest query looks like this: 我最新的查询如下所示:

select Userid, 
       AVG(duration) OVER () as OrgAverage,
       AVG(duration) as UserAverage,
       DATENAME(mm,MONTH(StartDate)) as Month
            from table.name 
            where YEAR(StartDate)=2014
            AND userid=119 
                  GROUP BY MONTH(StartDate), UserId     

This query bombs out with a "Duration' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause" error. 该查询在选择列表中用“持续时间”轰炸,因为它没有包含在聚合函数或GROUP BY子句中,因此无效。

Please keep in mind I'm dealing with a very large amount of data. 请记住,我正在处理大量数据。 I think I can make it work with CASE statements, but I'm looking for a cleaner, more efficient way to write the query if possible. 我认为我可以使其与CASE语句一起使用,但是我正在寻找一种更简洁,更有效的方式来编写查询(如果可能)。

Thank you! 谢谢!

平均函数中缺少分区子句

OVER ( Partition by MONTH(StartDate)) 

You are joining two queries together here: 您在这里将两个查询结合在一起:

  • Per-User average per month 每个用户每月的平均值
  • All Organisation average per month 所有组织平均每月

If you are only going to return data for one user at a time then an inline select may give you joy: 如果您一次仅要返回一个用户的数据,那么内联选择可能会给您带来欢乐:

SELECT AVG(a.duration) AS UserAvergage,
   (SELECT AVG(b.Duration) FROM tbl b WHERE MONTH(b.StartDate) = MONTH(a.StartDate)) AS OrgAverage 
    ...
    FROM tbl a
    WHERE userid = 119 
    GROUP BY MONTH(StartDate), UserId

Note - using comparison on MONTH may be slow - you may be better off having a CTE (Common Table Expression) 注意-在MONTH上使用比较可能会比较慢-使用CTE(通用表表达式)可能会更好

I was able to get it done using a self join, There's probably a better way. 我能够通过自我连接完成它,这可能是一种更好的方法。

Select UserId, AVG(t1.Duration) as Duration, t2.duration as OrgDur, t1.Month 
from #temp t1
inner join (Select Distinct MONTH, AVG(Duration) over (partition by Month) as duration
from #temp) t2 on t2.Month = t1.Month
group by t1.Month, t1.UserId, t2.Duration 
order by t1.UserId, Month desc

Here's using a CTE which is probably a better solution and definitely easier to read 这里使用的CTE可能是更好的解决方案,而且绝对更易于阅读

With MonthlyAverage
as 
(
Select MONTH, AVG(Duration) as OrgDur 
from #temp
group by Month
)

Select UserId, AVG(t1.Duration) as Duration, m.duration as OrgDur , t1.Month 
from #temp t1
inner join MonthlyAverage m on m.Month = t1.Month
group by UserId, t1.Month, m.duration
Please try this. It works fine to me.

WITH C1
AS
(
SELECT 
AVG(Duration) AS TotalAvg, 
[Month]
FROM [dbo].[Test]
GROUP BY [Month]
),
C2
AS
(
SELECT Distinct UserID,
AVG(Duration) OVER(PARTITION BY UserID, [Month] ORDER BY UserID) AS DetailedAvg, 
[Month]
FROM [dbo].[Test]
)
SELECT C2.*, C1.TotalAvg
FROM C2 c2 
INNER JOIN C1 c1 ON c1.[Month] = c2.[Month]
ORDER BY c2.UserID, c2.[Month] desc;

You can try below with less code. 您可以在下面用更少的代码尝试。

SELECT Distinct UserID,
AVG(Duration)  OVER(PARTITION BY [Month]) AS TotalAvg,
AVG(Duration) OVER(PARTITION BY UserID, [Month] ORDER BY UserID) AS DetailedAvg, 
[Month]
FROM [dbo].[Test]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM