[英]Partition Function COUNT() OVER possible using DISTINCT
I'm trying to write the following in order to get a running total of distinct NumUsers, like so:我正在尝试编写以下内容以获得不同的 NumUsers 总数,如下所示:
NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])
I'm trying to write the following in order to get a running total of distinct NumUsers, like so:我正在尝试编写以下内容,以便获得不同的NumUser的总运行量,如下所示:
NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])
Management studio doesn't seem too happy about this. Management Studio对此不太满意。 The error disappears when I remove the
DISTINCT
keyword, but then it won't be a distinct count.当我删除
DISTINCT
关键字时,错误消失了,但是不会有明显的区别。
DISTINCT
does not appear to be possible within the partition functions.在分区功能中似乎无法实现
DISTINCT
。 How do I go about finding the distinct count?我该如何找到不同的计数? Do I use a more traditional method such as a correlated subquery?
我是否使用更传统的方法,例如相关子查询?
Looking into this a bit further, maybe these OVER
functions work differently to Oracle in the way that they cannot be used in SQL-Server
to calculate running totals.再进一步研究,也许这些
OVER
函数与Oracle的工作方式不同,因为它们无法在SQL-Server
用于计算运行总计。
I've added a live example here on SQLfiddle where I attempt to use a partition function to calculate a running total.我在SQLfiddle上添加了一个实时示例,在该示例中,我尝试使用分区函数来计算运行总计。
I'm trying to write the following in order to get a running total of distinct NumUsers, like so:我正在尝试编写以下内容,以便获得不同的NumUser的总运行量,如下所示:
NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])
Management studio doesn't seem too happy about this. Management Studio对此不太满意。 The error disappears when I remove the
DISTINCT
keyword, but then it won't be a distinct count.当我删除
DISTINCT
关键字时,错误消失了,但是不会有明显的区别。
DISTINCT
does not appear to be possible within the partition functions.在分区功能中似乎无法实现
DISTINCT
。 How do I go about finding the distinct count?我该如何找到不同的计数? Do I use a more traditional method such as a correlated subquery?
我是否使用更传统的方法,例如相关子查询?
Looking into this a bit further, maybe these OVER
functions work differently to Oracle in the way that they cannot be used in SQL-Server
to calculate running totals.再进一步研究,也许这些
OVER
函数与Oracle的工作方式不同,因为它们无法在SQL-Server
用于计算运行总计。
I've added a live example here on SQLfiddle where I attempt to use a partition function to calculate a running total.我在SQLfiddle上添加了一个实时示例,在该示例中,我尝试使用分区函数来计算运行总计。
I'm trying to write the following in order to get a running total of distinct NumUsers, like so:我正在尝试编写以下内容,以便获得不同的NumUser的总运行量,如下所示:
NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])
Management studio doesn't seem too happy about this. Management Studio对此不太满意。 The error disappears when I remove the
DISTINCT
keyword, but then it won't be a distinct count.当我删除
DISTINCT
关键字时,错误消失了,但是不会有明显的区别。
DISTINCT
does not appear to be possible within the partition functions.在分区功能中似乎无法实现
DISTINCT
。 How do I go about finding the distinct count?我该如何找到不同的计数? Do I use a more traditional method such as a correlated subquery?
我是否使用更传统的方法,例如相关子查询?
Looking into this a bit further, maybe these OVER
functions work differently to Oracle in the way that they cannot be used in SQL-Server
to calculate running totals.再进一步研究,也许这些
OVER
函数与Oracle的工作方式不同,因为它们无法在SQL-Server
用于计算运行总计。
I've added a live example here on SQLfiddle where I attempt to use a partition function to calculate a running total.我在SQLfiddle上添加了一个实时示例,在该示例中,我尝试使用分区函数来计算运行总计。
I'm trying to write the following in order to get a running total of distinct NumUsers, like so:我正在尝试编写以下内容,以便获得不同的NumUser的总运行量,如下所示:
NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])
Management studio doesn't seem too happy about this. Management Studio对此不太满意。 The error disappears when I remove the
DISTINCT
keyword, but then it won't be a distinct count.当我删除
DISTINCT
关键字时,错误消失了,但是不会有明显的区别。
DISTINCT
does not appear to be possible within the partition functions.在分区功能中似乎无法实现
DISTINCT
。 How do I go about finding the distinct count?我该如何找到不同的计数? Do I use a more traditional method such as a correlated subquery?
我是否使用更传统的方法,例如相关子查询?
Looking into this a bit further, maybe these OVER
functions work differently to Oracle in the way that they cannot be used in SQL-Server
to calculate running totals.再进一步研究,也许这些
OVER
函数与Oracle的工作方式不同,因为它们无法在SQL-Server
用于计算运行总计。
I've added a live example here on SQLfiddle where I attempt to use a partition function to calculate a running total.我在SQLfiddle上添加了一个实时示例,在该示例中,我尝试使用分区函数来计算运行总计。
There is a solution in simple SQL:在简单的 SQL 中有一个解决方案:
SELECT COUNT(DISTINCT user) OVER(ORDER BY time) AS users
FROM users
=> =>
SELECT COUNT(*) OVER(ORDER BY time) AS users
FROM (
SELECT user, MIN(time) AS time
FROM users
GROUP BY user
) t
I wandered in here with essentially the same question as whytheq<\/strong> and found David<\/strong> ’s solution, but then had to review my old self-tutorial notes regarding DENSE_RANK because I use it so rarely: why DENSE_RANK instead of RANK or ROW_NUMBER, and how does it actually work?我在这里徘徊,与whytheq<\/strong>基本相同的问题并找到了David<\/strong>的解决方案,但随后不得不回顾我关于 DENSE_RANK 的旧自学笔记,因为我很少使用它:为什么 DENSE_RANK 而不是 RANK 或 ROW_NUMBER,以及它是如何做到的实际工作? In the process, I updated that tutorial to include David<\/strong> ’s solution for this particular problem, and then thought it might be helpful for SQL newbies (or others like me who forget stuff).
在此过程中,我更新了该教程以包含David<\/strong>针对这个特定问题的解决方案,然后认为它可能对 SQL 新手(或像我这样忘记东西的其他人)有所帮助。
The whole tutorial text can be copy\/pasted into a query editor and then each example query can be (separately) uncommented and run, to see their respective results.整个教程文本可以复制\/粘贴到查询编辑器中,然后每个示例查询可以(单独)取消注释并运行,以查看它们各自的结果。 (By default, the solution to this problem is uncommented at the bottom.) Or, each example can be copied separately into their own query edit instance but the TBLx<\/strong> CTE must be included in each.
(默认情况下,此问题的解决方案在底部未注释。)或者,每个示例可以单独复制到它们自己的查询编辑实例中,但TBLx<\/strong> CTE 必须包含在每个示例中。
--WITH /* DB2 version */
--TBLx (Col_A, Col_B) AS (VALUES
-- ( 1, 1 ),
-- ( 1, 1 ),
-- ( 1, 1 ),
-- ( 1, 2 ))
WITH /* SQL-Server version */
TBLx (Col_A, Col_B) AS
(SELECT 1, 1 UNION ALL
SELECT 1, 1 UNION ALL
SELECT 1, 1 UNION ALL
SELECT 1, 2)
/*** Example-A: demonstrates the difference between ROW_NUMBER, RANK and DENSE_RANK ***/
--SELECT Col_A, Col_B,
-- ROW_NUMBER() OVER(PARTITION BY Col_A ORDER BY Col_B) AS ROW_NUMBER_,
-- RANK() OVER(PARTITION BY Col_A ORDER BY Col_B) AS RANK_,
-- DENSE_RANK() OVER(PARTITION BY Col_A ORDER BY Col_B) AS DENSE_RANK_
--FROM TBLx
/* RESULTS:
Col_A Col_B ROW_NUMBER_ RANK_ DENSE_RANK_
1 1 1 1 1
1 1 2 1 1
1 1 3 1 1
1 2 4 4 2
ROW_NUMBER: Just increments for the three identical rows and increments again for the final unique row.
That is, it’s an order-value (based on "sort" order) but makes no other distinction.
RANK: Assigns the same rank value to the three identical rows, then jumps to 4 for the fourth row,
which is *unique* with regard to the others.
That is, each identical row is ranked by the rank-order of the first row-instance of that
(identical) value-set.
DENSE_RANK: Also assigns the same rank value to the three identical rows but the fourth *unique* row is
assigned a value of 2.
That is, DENSE_RANK identifies that there are (only) two *unique* row-types in the row set.
*/
/*** Example-B: to get only the distinct resulting "count-of-each-row-type" rows ***/
-- SELECT DISTINCT -- For unique returned "count-of-each-row-type" rows, the DISTINCT operator is necessary because
-- -- the calculated DENSE_RANK value is appended to *all* rows in the data set. Without DISTINCT,
-- -- its value for each original-data row-type will just be replicated for each of those rows.
--
-- Col_A, Col_B,
-- DENSE_RANK() OVER(PARTITION BY Col_A ORDER BY Col_B) AS DISTINCT_ROWTYPE_COUNT_
-- FROM TBLx
/* RESULTS:
Col_A Col_B DISTINCT_ROWTYPE_COUNT_
1 1 1
1 2 2
*/
/*** Example-C.1: demonstrates the derivation of the "count-of-all-row-types" (finalized in Example-C.2, below) ***/
-- SELECT
-- Col_A, Col_B,
--
-- DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B ASC) AS ROW_TYPES_COUNT_ASC_,
-- DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B DESC) AS ROW_TYPES_COUNT_DESC_,
--
-- -- Adding the above cases together and subtracting one gives the same total count for on each resulting row:
--
-- DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B ASC)
-- +
-- DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B DESC)
-- - 1 /* (Because DENSE_RANK values are one-based) */
-- AS ROW_TYPES_COUNT_
-- FROM TBLx
/* RESULTS:
COL_A COL_B ROW_TYPES_COUNT_ASC_ ROW_TYPES_COUNT_DESC_ ROW_TYPES_COUNT_
1 2 2 1 2
1 1 1 2 2
1 1 1 2 2
1 1 1 2 2
*/
/*** Example-C.2: uses the above technique to get a *single* resulting "count-of-all-row-types" row ***/
SELECT DISTINCT -- For a single returned "count-of-all-row-types" row, the DISTINCT operator is necessary because the
-- calculated DENSE_RANK value is appended to *all* rows in the data set. Without DISTINCT, that
-- value will just be replicated for each original-data row.
-- Col_A, Col_B, -- In order to get a *single* returned "count-of-all-row-types" row (and field), all other fields
-- must be excluded because their respective differing row-values will defeat the purpose of the
-- DISTINCT operator, above.
DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B ASC)
+
DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B DESC)
- 1 /* (Because DENSE_RANK values are one-based) */
AS ROW_TYPES_COUNT_
FROM TBLx
/* RESULTS:
ROW_TYPES_COUNT_
2
*/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.