[英]How to aggregate (counting distinct items) over a sliding window in SQL Server?
我目前正在使用此查询(在SQL Server中)每天计算唯一项目的数量:
SELECT Date, COUNT(DISTINCT item)
FROM myTable
GROUP BY Date
ORDER BY Date
如何对此进行转换以获取过去3天 (包括当天)中每个日期的唯一商品数量 ?
输出应该是一个包含2列的表:一列包含原始表中的所有日期。 在第二列,我们有每个日期的唯一项目数。
例如,如果原始表是:
Date Item
01/01/2018 A
01/01/2018 B
02/01/2018 C
03/01/2018 C
04/01/2018 C
根据我上面的查询,我目前获得每天的唯一计数:
Date count
01/01/2018 2
02/01/2018 1
03/01/2018 1
04/01/2018 1
我希望得到3天滚动窗口的独特计数:
Date count
01/01/2018 2
02/01/2018 3 (because items ABC on 1st and 2nd Jan)
03/01/2018 3 (because items ABC on 1st,2nd,3rd Jan)
04/01/2018 1 (because only item C on 2nd,3rd,4th Jan)
使用apply
提供了一种形成滑动窗口的便捷方式
CREATE TABLE myTable
([DateCol] datetime, [Item] varchar(1))
;
INSERT INTO myTable
([DateCol], [Item])
VALUES
('2018-01-01 00:00:00', 'A'),
('2018-01-01 00:00:00', 'B'),
('2018-01-02 00:00:00', 'C'),
('2018-01-03 00:00:00', 'C'),
('2018-01-04 00:00:00', 'C')
;
CREATE NONCLUSTERED INDEX IX_DateCol
ON MyTable([Date])
;
查询 :
select distinct
t1.dateCol
, oa.ItemCount
from myTable t1
outer apply (
select count(distinct t2.item) as ItemCount
from myTable t2
where t2.DateCol between dateadd(day,-2,t1.DateCol) and t1.DateCol
) oa
order by t1.dateCol ASC
结果 :
| dateCol | ItemCount |
|----------------------|-----------|
| 2018-01-01T00:00:00Z | 2 |
| 2018-01-02T00:00:00Z | 3 |
| 2018-01-03T00:00:00Z | 3 |
| 2018-01-04T00:00:00Z | 1 |
通过在使用apply
之前减少date
列可能会有一些性能提升,如下所示:
select
d.date
, oa.ItemCount
from (
select distinct t1.date
from myTable t1
) d
outer apply (
select count(distinct t2.item) as ItemCount
from myTable t2
where t2.Date between dateadd(day,-2,d.Date) and d.Date
) oa
order by d.date ASC
;
您可以使用group by
而不是在子查询中使用select distinct
,但执行计划将保持不变。
最直接的解决方案是根据日期加入表格:
SELECT t1.DateCol, COUNT(DISTINCT t2.Item) AS C
FROM testdata AS t1
LEFT JOIN testdata AS t2 ON t2.DateCol BETWEEN DATEADD(dd, -2, t1.DateCol) AND t1.DateCol
GROUP BY t1.DateCol
ORDER BY t1.DateCol
输出:
| DateCol | C |
|-------------------------|---|
| 2018-01-01 00:00:00.000 | 2 |
| 2018-01-02 00:00:00.000 | 3 |
| 2018-01-03 00:00:00.000 | 3 |
| 2018-01-04 00:00:00.000 | 1 |
GROUP BY
应该比DISTINCT
快(确保在Date
列上有索引)
DECLARE @tbl TABLE([Date] DATE, [Item] VARCHAR(100))
;
INSERT INTO @tbl VALUES
('2018-01-01 00:00:00', 'A'),
('2018-01-01 00:00:00', 'B'),
('2018-01-02 00:00:00', 'C'),
('2018-01-03 00:00:00', 'C'),
('2018-01-04 00:00:00', 'C');
SELECT t.[Date]
--Just for control. You can take this part away
,(SELECT DISTINCT t2.[Item] AS [*]
FROM @tbl AS t2
WHERE t2.[Date]<=t.[Date]
AND t2.[Date]>=DATEADD(DAY,-2,t.[Date]) FOR XML PATH('')) AS CountedItems
--This sub-select comes back with your counts
,(SELECT COUNT(DISTINCT t2.[Item])
FROM @tbl AS t2
WHERE t2.[Date]<=t.[Date]
AND t2.[Date]>=DATEADD(DAY,-2,t.[Date])) AS ItemCount
FROM @tbl AS t
GROUP BY t.[Date];
结果
Date CountedItems ItemCount
2018-01-01 AB 2
2018-01-02 ABC 3
2018-01-03 ABC 3
2018-01-04 C 1
该解决方案与其他解决方案不同。 你可以通过与其他答案的比较来检查这个查询在真实数据上的表现吗?
基本思想是每行可以在其自己的日期,后一天或后一天参与窗口。 因此,首先将行扩展为三行,并附加不同的日期,然后它可以在计算日期使用常规COUNT(DISTINCT)
聚合。 HAVING
子句只是为了避免返回单独计算并且不存在于基础数据中的日期的结果。
with cte(Date, Item) as (
select cast(a as datetime), b
from (values
('01/01/2018','A')
,('01/01/2018','B')
,('02/01/2018','C')
,('03/01/2018','C')
,('04/01/2018','C')) t(a,b)
)
select
[Date] = dateadd(dd, n, Date), [Count] = count(distinct Item)
from
cte
cross join (values (0),(1),(2)) t(n)
group by dateadd(dd, n, Date)
having max(iif(n = 0, 1, 0)) = 1
option (force order)
输出:
| Date | Count |
|-------------------------|-------|
| 2018-01-01 00:00:00.000 | 2 |
| 2018-01-02 00:00:00.000 | 3 |
| 2018-01-03 00:00:00.000 | 3 |
| 2018-01-04 00:00:00.000 | 1 |
如果您有许多重复行可能会更快:
select
[Date] = dateadd(dd, n, Date), [Count] = count(distinct Item)
from
(select distinct Date, Item from cte) c
cross join (values (0),(1),(2)) t(n)
group by dateadd(dd, n, Date)
having max(iif(n = 0, 1, 0)) = 1
option (force order)
使用GETDATE()
函数获取当前日期,使用DATEADD()
获取最近3天
SELECT Date, count(DISTINCT item)
FROM myTable
WHERE [Date] >= DATEADD(day,-3, GETDATE())
GROUP BY Date
ORDER BY Date
SELECT DISTINCT Date,
(SELECT COUNT(DISTINCT item)
FROM myTable t2
WHERE t2.Date BETWEEN DATEADD(day, -2, t1.Date) AND t1.Date) AS count
FROM myTable t1
ORDER BY Date;
Rextester演示: http ://rextester.com/ZRDQ22190
由于不支持COUNT(DISTINCT item) OVER (PARTITION BY [Date])
您可以使用dense_rank
来模拟:
SELECT Date, dense_rank() over (partition by [Date] order by [item])
+ dense_rank() over (partition by [Date] order by [item] desc)
- 1 as count_distinct_item
FROM myTable
需要注意的一点是, dense_rank
将计为null,而COUNT
则不计算。
请参阅此职位的更多细节。
这是一个简单的解决方案,它使用myTable本身作为分组日期的来源(为SQLServer dateadd编辑)。 请注意,此查询假定myTable中每个日期至少会有一条记录; 如果没有任何日期,即使前两天有记录,它也不会出现在查询结果中:
select
date,
(select
count(distinct item)
from (select distinct date, item from myTable) as d2
where
d2.date between dateadd(day,-2,d.date) and d.date
) as count
from (select distinct date from myTable) as d
我用Math解决了这个问题。
z(任何一天)= 3x + y(y是模式3值)我需要从3 *(x - 1)+ y + 1到3 *(x - 1)+ y + 3
3 *(x-1)+ y + 1 = 3 *(z / 3-1)+ z%3 + 1
在这种情况下; 我可以使用group by(在3 *(z / 3 - 1)+ z%3 + 1和z之间)
SELECT iif(OrderDate between 3 * (cast(OrderDate as int) / 3 - 1) + (cast(OrderDate as int) % 3) + 1
and orderdate, Orderdate, 0)
, count(sh.SalesOrderID) FROM Sales.SalesOrderDetail shd
JOIN Sales.SalesOrderHeader sh on sh.SalesOrderID = shd.SalesOrderID
group by iif(OrderDate between 3 * (cast(OrderDate as int) / 3 - 1) + (cast(OrderDate as int) % 3) + 1
and orderdate, Orderdate, 0)
order by iif(OrderDate between 3 * (cast(OrderDate as int) / 3 - 1) + (cast(OrderDate as int) % 3) + 1
and orderdate, Orderdate, 0)
如果你需要其他日组,你可以使用;
declare @n int = 4 (another day count)
SELECT iif(OrderDate between @n * (cast(OrderDate as int) / @n - 1) + (cast(OrderDate as int) % @n) + 1
and orderdate, Orderdate, 0)
, count(sh.SalesOrderID) FROM Sales.SalesOrderDetail shd
JOIN Sales.SalesOrderHeader sh on sh.SalesOrderID = shd.SalesOrderID
group by iif(OrderDate between @n * (cast(OrderDate as int) / @n - 1) + (cast(OrderDate as int) % @n) + 1
and orderdate, Orderdate, 0)
order by iif(OrderDate between @n * (cast(OrderDate as int) / @n - 1) + (cast(OrderDate as int) % @n) + 1
and orderdate, Orderdate, 0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.