使用 T-SQL 滚动 n 天活跃用户的 COUNT DISTINCT

Question

I am counting 7-day active users using T-SQL .我正在计算使用T-SQL的 7 天活跃用户。 I used the following code:我使用了以下代码：

SELECT 
    *, 
    COUNT(DISTINCT [UserID]) OVER (
        PARTITION BY [HospitalID], [HospitalName], [Device]
        ORDER BY [Date]
        ROWS 7 PRECEDING
    ) AS [7-Day Active Users]
FROM UserActivity
ORDER BY [HospitalID], [HospitalName], [Device], [Date]

I was told Use of DISTINCT is not allowed with the OVER clause .有人告诉我Use of DISTINCT is not allowed with the OVER clause 。 UserActivity is a table with columns HospitalID , HospitalName , Device (either phone or tablet), Date and UserID (could be NULL). UserActivity是一个表，其中包含HospitalID 、 HospitalName 、 Device （手机或平板电脑）、 Date和UserID （可能为 NULL）列。 To make things easier, I have filled the gaps between dates which made Date consecutive so I can use ROWS 7 PRECEDING with confidence.为了让事情变得更容易，我已经填补了使Date连续的日期之间的空白，因此我可以放心地使用ROWS 7 PRECEDING 。 I did a lot of searches online and found most solution are either using other types of SQL (which is not possible in my case) or using DENSE_RANK function which does not support a moving window.我在网上做了很多搜索，发现大多数解决方案要么使用其他类型的 SQL（这在我的情况下是不可能的），要么使用不支持移动窗口的DENSE_RANK函数。 What is the correct and hopefully simpler, concise way of solving my problem?解决我的问题的正确且希望更简单，简洁的方法是什么？

Sample Data: https://docs.google.com/spreadsheets/d/19vrBK8ixpiPJycRjb1ekiKnEUYk5AaUH/edit?usp=sharing&ouid=110206477774349430845&rtpof=true&sd=true示例数据： https ://docs.google.com/spreadsheets/d/19vrBK8ixpiPJycRjb1ekiKnEUYk5AaUH/edit?usp=sharing&ouid=110206477774349430845&rtpof=true&sd=true

Answer 1

Sorry to see that COUNT DISTINCT was not supported in that type of SQL... I hadn't known that.很抱歉看到那种类型的 SQL 不支持COUNT DISTINCT ......我不知道。 Especially after you went to the trouble of fixing the gaps between dates!尤其是在您费心修复日期之间的差距之后！

I used Rasgo to generate the SQL -- so this won't work directly in your version (tested with Snowflake), but I think it will work as long as you fix the DATEADD function.我使用Rasgo生成 SQL ——所以这不会直接在你的版本中工作（用雪花测试），但我认为只要你修复DATEADD函数它就会工作。 Every RDBMS seems to do DATEADD differently, it seems.每个 RDBMS 似乎都以不同的方式执行DATEADD 。

The general concept here is to join the data upon itself using a range join condition in the WHERE clause.这里的一般概念是使用WHERE子句中的range join条件将数据连接到自身上。

Luckily, this should work for you without having to fix the gaps in the dates first.幸运的是，这应该对您有用，而无需先修复日期中的空白。

WITH BASIC_OFFSET_7DAY AS (
  SELECT 
    A.HOSPITALNAME, 
    A.HOSPITALID, 
    A.DEVICE, 
    A.DATE, 
    COUNT(DISTINCT B.USERID) as COUNT_DISTINCT_USERID_PAST7DAY, 
    COUNT(1) AS AGG_ROW_COUNT 
  FROM 
    UserActivity A 
    INNER JOIN UserActivity B ON A.HOSPITALNAME = B.HOSPITALNAME 
    AND A.HOSPITALID = B.HOSPITALID 
    AND A.DEVICE = B.DEVICE 
  WHERE 
    B.DATE >= DATEADD(day, -7, A.DATE) 
    AND B.DATE <= A.DATE 
  GROUP BY 
    A.HOSPITALNAME, 
    A.HOSPITALID, 
    A.DEVICE, 
    A.DATE
) 
SELECT 
  src.*, 
  BASIC_OFFSET_7DAY.COUNT_DISTINCT_USERID_PAST7DAY 
FROM 
  UserActivity src 
  LEFT OUTER JOIN BASIC_OFFSET_7DAY ON BASIC_OFFSET_7DAY.DATE = src.DATE 
  AND BASIC_OFFSET_7DAY.HOSPITALNAME = src.HOSPITALNAME 
  AND BASIC_OFFSET_7DAY.HOSPITALID = src.HOSPITALID 
  AND BASIC_OFFSET_7DAY.DEVICE = src.DEVICE

Let me know how that works out and if it doesn't work I'll help you out.让我知道它是如何工作的，如果它不起作用，我会帮助你。

Edit: For those who are trying to do this and getting stuck, a common mistake (one that I myself performed when I did this by hand) is to pay careful attention to COUNT(DISTINCT(B.col)) and not A.col.编辑：对于那些试图这样做并陷入困境的人，一个常见的错误（我自己手动执行此操作时犯的一个）是要特别注意 COUNT(DISTINCT(B.col)) 而不是 A.col . When I used Rasgo to generate the SQL to check myself, I caught my mistake.当我使用 Rasgo 生成 SQL 来检查自己时，我发现了我的错误。 Hopefully this note helps someone in the future who makes this same mistake!希望这篇笔记能帮助将来犯同样错误的人！

使用 T-SQL 滚动 n 天活跃用户的 COUNT DISTINCT

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-05-09 19:19:39

使用 T-SQL 滚动 n 天活跃用户的 COUNT DISTINCT

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-05-09 19:19:39

解决方案1
1 已采纳 2022-05-09 19:19:39