"分区函数 COUNT() OVER 可能使用 DISTINCT"

Question

I'm trying to write the following in order to get a running total of distinct NumUsers, like so:我正在尝试编写以下内容以获得不同的 NumUsers 总数，如下所示：

NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])

Answer 1

I'm trying to write the following in order to get a running total of distinct NumUsers, like so:我正在尝试编写以下内容，以便获得不同的NumUser的总运行量，如下所示：

NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])

Management studio doesn't seem too happy about this. Management Studio对此不太满意。 The error disappears when I remove the DISTINCT keyword, but then it won't be a distinct count.当我删除DISTINCT关键字时，错误消失了，但是不会有明显的区别。

DISTINCT does not appear to be possible within the partition functions.在分区功能中似乎无法实现DISTINCT 。 How do I go about finding the distinct count?我该如何找到不同的计数？ Do I use a more traditional method such as a correlated subquery?我是否使用更传统的方法，例如相关子查询？

Looking into this a bit further, maybe these OVER functions work differently to Oracle in the way that they cannot be used in SQL-Server to calculate running totals.再进一步研究，也许这些OVER函数与Oracle的工作方式不同，因为它们无法在SQL-Server用于计算运行总计。

I've added a live example here on SQLfiddle where I attempt to use a partition function to calculate a running total.我在SQLfiddle上添加了一个实时示例，在该示例中，我尝试使用分区函数来计算运行总计。

Answer 2

I'm trying to write the following in order to get a running total of distinct NumUsers, like so:我正在尝试编写以下内容，以便获得不同的NumUser的总运行量，如下所示：

NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])

Management studio doesn't seem too happy about this. Management Studio对此不太满意。 The error disappears when I remove the DISTINCT keyword, but then it won't be a distinct count.当我删除DISTINCT关键字时，错误消失了，但是不会有明显的区别。

DISTINCT does not appear to be possible within the partition functions.在分区功能中似乎无法实现DISTINCT 。 How do I go about finding the distinct count?我该如何找到不同的计数？ Do I use a more traditional method such as a correlated subquery?我是否使用更传统的方法，例如相关子查询？

Looking into this a bit further, maybe these OVER functions work differently to Oracle in the way that they cannot be used in SQL-Server to calculate running totals.再进一步研究，也许这些OVER函数与Oracle的工作方式不同，因为它们无法在SQL-Server用于计算运行总计。

I've added a live example here on SQLfiddle where I attempt to use a partition function to calculate a running total.我在SQLfiddle上添加了一个实时示例，在该示例中，我尝试使用分区函数来计算运行总计。

Answer 3

I'm trying to write the following in order to get a running total of distinct NumUsers, like so:我正在尝试编写以下内容，以便获得不同的NumUser的总运行量，如下所示：

NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])

Management studio doesn't seem too happy about this. Management Studio对此不太满意。 The error disappears when I remove the DISTINCT keyword, but then it won't be a distinct count.当我删除DISTINCT关键字时，错误消失了，但是不会有明显的区别。

DISTINCT does not appear to be possible within the partition functions.在分区功能中似乎无法实现DISTINCT 。 How do I go about finding the distinct count?我该如何找到不同的计数？ Do I use a more traditional method such as a correlated subquery?我是否使用更传统的方法，例如相关子查询？

Looking into this a bit further, maybe these OVER functions work differently to Oracle in the way that they cannot be used in SQL-Server to calculate running totals.再进一步研究，也许这些OVER函数与Oracle的工作方式不同，因为它们无法在SQL-Server用于计算运行总计。

I've added a live example here on SQLfiddle where I attempt to use a partition function to calculate a running total.我在SQLfiddle上添加了一个实时示例，在该示例中，我尝试使用分区函数来计算运行总计。

Answer 4

I'm trying to write the following in order to get a running total of distinct NumUsers, like so:我正在尝试编写以下内容，以便获得不同的NumUser的总运行量，如下所示：

NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])

Management studio doesn't seem too happy about this. Management Studio对此不太满意。 The error disappears when I remove the DISTINCT keyword, but then it won't be a distinct count.当我删除DISTINCT关键字时，错误消失了，但是不会有明显的区别。

DISTINCT does not appear to be possible within the partition functions.在分区功能中似乎无法实现DISTINCT 。 How do I go about finding the distinct count?我该如何找到不同的计数？ Do I use a more traditional method such as a correlated subquery?我是否使用更传统的方法，例如相关子查询？

Looking into this a bit further, maybe these OVER functions work differently to Oracle in the way that they cannot be used in SQL-Server to calculate running totals.再进一步研究，也许这些OVER函数与Oracle的工作方式不同，因为它们无法在SQL-Server用于计算运行总计。

I've added a live example here on SQLfiddle where I attempt to use a partition function to calculate a running total.我在SQLfiddle上添加了一个实时示例，在该示例中，我尝试使用分区函数来计算运行总计。

Answer 5

There is a solution in simple SQL:在简单的 SQL 中有一个解决方案：

SELECT COUNT(DISTINCT user) OVER(ORDER BY time) AS users
FROM users

=> =>

SELECT COUNT(*) OVER(ORDER BY time) AS users
FROM (
    SELECT user, MIN(time) AS time
    FROM users
    GROUP BY user
) t

Answer 6

I wandered in here with essentially the same question as whytheq<\/strong> and found David<\/strong> ’s solution, but then had to review my old self-tutorial notes regarding DENSE_RANK because I use it so rarely: why DENSE_RANK instead of RANK or ROW_NUMBER, and how does it actually work?我在这里徘徊，与whytheq<\/strong>基本相同的问题并找到了David<\/strong>的解决方案，但随后不得不回顾我关于 DENSE_RANK 的旧自学笔记，因为我很少使用它：为什么 DENSE_RANK 而不是 RANK 或 ROW_NUMBER，以及它是如何做到的实际工作？ In the process, I updated that tutorial to include David<\/strong> ’s solution for this particular problem, and then thought it might be helpful for SQL newbies (or others like me who forget stuff).在此过程中，我更新了该教程以包含David<\/strong>针对这个特定问题的解决方案，然后认为它可能对 SQL 新手（或像我这样忘记东西的其他人）有所帮助。

The whole tutorial text can be copy\/pasted into a query editor and then each example query can be (separately) uncommented and run, to see their respective results.整个教程文本可以复制\/粘贴到查询编辑器中，然后每个示例查询可以（单独）取消注释并运行，以查看它们各自的结果。 (By default, the solution to this problem is uncommented at the bottom.) Or, each example can be copied separately into their own query edit instance but the TBLx<\/strong> CTE must be included in each. （默认情况下，此问题的解决方案在底部未注释。）或者，每个示例可以单独复制到它们自己的查询编辑实例中，但TBLx<\/strong> CTE 必须包含在每个示例中。

--WITH /* DB2 version */
--TBLx (Col_A, Col_B) AS (VALUES 
--     (  1,     1  ),
--     (  1,     1  ),
--     (  1,     1  ),
--     (  1,     2  ))

WITH /* SQL-Server version */
TBLx    (Col_A, Col_B) AS
  (SELECT  1,     1    UNION ALL
   SELECT  1,     1    UNION ALL
   SELECT  1,     1    UNION ALL
   SELECT  1,     2)

/*** Example-A: demonstrates the difference between ROW_NUMBER, RANK and DENSE_RANK ***/

  --SELECT Col_A, Col_B,
  --  ROW_NUMBER() OVER(PARTITION BY Col_A ORDER BY Col_B) AS ROW_NUMBER_,
  --  RANK() OVER(PARTITION BY Col_A ORDER BY Col_B)       AS RANK_,
  --  DENSE_RANK() OVER(PARTITION BY Col_A ORDER BY Col_B) AS DENSE_RANK_
  --FROM TBLx

  /* RESULTS:    
    Col_A  Col_B  ROW_NUMBER_  RANK_  DENSE_RANK_
      1      1        1          1        1
      1      1        2          1        1
      1      1        3          1        1
      1      2        4          4        2

     ROW_NUMBER: Just increments for the three identical rows and increments again for the final unique row.
                 That is, it’s an order-value (based on "sort" order) but makes no other distinction.
                 
           RANK: Assigns the same rank value to the three identical rows, then jumps to 4 for the fourth row,
                 which is *unique* with regard to the others.
                 That is, each identical row is ranked by the rank-order of the first row-instance of that
                 (identical) value-set.
                 
     DENSE_RANK: Also assigns the same rank value to the three identical rows but the fourth *unique* row is
                 assigned a value of 2.
                 That is, DENSE_RANK identifies that there are (only) two *unique* row-types in the row set.
  */

/*** Example-B: to get only the distinct resulting "count-of-each-row-type" rows ***/

--  SELECT DISTINCT -- For unique returned "count-of-each-row-type" rows, the DISTINCT operator is necessary because
--                  -- the calculated DENSE_RANK value is appended to *all* rows in the data set.  Without DISTINCT,
--                  -- its value for each original-data row-type will just be replicated for each of those rows.
--                  
--    Col_A, Col_B,                
--    DENSE_RANK() OVER(PARTITION BY Col_A ORDER BY Col_B) AS DISTINCT_ROWTYPE_COUNT_
--  FROM TBLx

  /* RESULTS:
    Col_A  Col_B  DISTINCT_ROWTYPE_COUNT_
      1      1            1
      1      2            2
  */

/*** Example-C.1: demonstrates the derivation of the "count-of-all-row-types" (finalized in Example-C.2, below) ***/

--  SELECT
--    Col_A, Col_B,
--    
--    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B ASC) AS ROW_TYPES_COUNT_ASC_,
--    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B DESC) AS ROW_TYPES_COUNT_DESC_,
--    
--    -- Adding the above cases together and subtracting one gives the same total count for on each resulting row:
--    
--    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B ASC)
--       +
--    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B DESC)
--      - 1   /* (Because DENSE_RANK values are one-based) */
--      AS ROW_TYPES_COUNT_
--  FROM TBLx

  /* RESULTS:
    COL_A  COL_B  ROW_TYPES_COUNT_ASC_  ROW_TYPES_COUNT_DESC_  ROW_TYPES_COUNT_
      1      2            2                     1                    2
      1      1            1                     2                    2
      1      1            1                     2                    2
      1      1            1                     2                    2
  */

/*** Example-C.2: uses the above technique to get a *single* resulting "count-of-all-row-types" row ***/

  SELECT DISTINCT -- For a single returned "count-of-all-row-types" row, the DISTINCT operator is necessary because the
                  -- calculated DENSE_RANK value is appended to *all* rows in the data set.  Without DISTINCT, that
                  -- value will just be replicated for each original-data row.
                  
  --  Col_A, Col_B, -- In order to get a *single* returned "count-of-all-row-types" row (and field), all other fields
                    -- must be excluded because their respective differing row-values will defeat the purpose of the
                    -- DISTINCT operator, above.
                   
    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B ASC)
       +
    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B DESC)
      - 1   /* (Because DENSE_RANK values are one-based) */
      AS ROW_TYPES_COUNT_
  FROM TBLx
  
  /* RESULTS:

    ROW_TYPES_COUNT_
          2
  */

"分区函数 COUNT() OVER 可能使用 DISTINCT"

问题描述

6 个解决方案

解决方案1
183 已采纳 2014-03-12 09:45:51

解决方案2
6 2015-12-18 09:37:31

解决方案3
5 2012-06-26 08:20:20

解决方案4
5 2015-02-22 07:55:40

解决方案5
0 2021-12-28 22:42:58

解决方案6
0 2022-02-02 20:02:48

"分区函数 COUNT() OVER 可能使用 DISTINCT"

问题描述

6 个解决方案

解决方案1 183 已采纳 2014-03-12 09:45:51

解决方案2 6 2015-12-18 09:37:31

解决方案3 5 2012-06-26 08:20:20

解决方案4 5 2015-02-22 07:55:40

解决方案5 0 2021-12-28 22:42:58

解决方案6 0 2022-02-02 20:02:48

解决方案1
183 已采纳 2014-03-12 09:45:51

解决方案2
6 2015-12-18 09:37:31

解决方案3
5 2012-06-26 08:20:20

解决方案4
5 2015-02-22 07:55:40

解决方案5
0 2021-12-28 22:42:58

解决方案6
0 2022-02-02 20:02:48