簡體   English   中英

"分區函數 COUNT() OVER 可能使用 DISTINCT"

[英]Partition Function COUNT() OVER possible using DISTINCT

我正在嘗試編寫以下內容以獲得不同的 NumUsers 總數,如下所示:

NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])

我正在嘗試編寫以下內容,以便獲得不同的NumUser的總運行量,如下所示:

NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])

Management Studio對此不太滿意。 當我刪除DISTINCT關鍵字時,錯誤消失了,但是不會有明顯的區別。

在分區功能中似乎無法實現DISTINCT 我該如何找到不同的計數? 我是否使用更傳統的方法,例如相關子查詢?

再進一步研究,也許這些OVER函數與Oracle的工作方式不同,因為它們無法在SQL-Server用於計算運行總計。

我在SQLfiddle上添加了一個實時示例,在該示例中,我嘗試使用分區函數來計算運行總計。

我正在嘗試編寫以下內容,以便獲得不同的NumUser的總運行量,如下所示:

NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])

Management Studio對此不太滿意。 當我刪除DISTINCT關鍵字時,錯誤消失了,但是不會有明顯的區別。

在分區功能中似乎無法實現DISTINCT 我該如何找到不同的計數? 我是否使用更傳統的方法,例如相關子查詢?

再進一步研究,也許這些OVER函數與Oracle的工作方式不同,因為它們無法在SQL-Server用於計算運行總計。

我在SQLfiddle上添加了一個實時示例,在該示例中,我嘗試使用分區函數來計算運行總計。

我正在嘗試編寫以下內容,以便獲得不同的NumUser的總運行量,如下所示:

NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])

Management Studio對此不太滿意。 當我刪除DISTINCT關鍵字時,錯誤消失了,但是不會有明顯的區別。

在分區功能中似乎無法實現DISTINCT 我該如何找到不同的計數? 我是否使用更傳統的方法,例如相關子查詢?

再進一步研究,也許這些OVER函數與Oracle的工作方式不同,因為它們無法在SQL-Server用於計算運行總計。

我在SQLfiddle上添加了一個實時示例,在該示例中,我嘗試使用分區函數來計算運行總計。

我正在嘗試編寫以下內容,以便獲得不同的NumUser的總運行量,如下所示:

NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])

Management Studio對此不太滿意。 當我刪除DISTINCT關鍵字時,錯誤消失了,但是不會有明顯的區別。

在分區功能中似乎無法實現DISTINCT 我該如何找到不同的計數? 我是否使用更傳統的方法,例如相關子查詢?

再進一步研究,也許這些OVER函數與Oracle的工作方式不同,因為它們無法在SQL-Server用於計算運行總計。

我在SQLfiddle上添加了一個實時示例,在該示例中,我嘗試使用分區函數來計算運行總計。

在簡單的 SQL 中有一個解決方案:

SELECT COUNT(DISTINCT user) OVER(ORDER BY time) AS users
FROM users

=>

SELECT COUNT(*) OVER(ORDER BY time) AS users
FROM (
    SELECT user, MIN(time) AS time
    FROM users
    GROUP BY user
) t

我在這里徘徊,與whytheq<\/strong>基本相同的問題並找到了David<\/strong>的解決方案,但隨后不得不回顧我關於 DENSE_RANK 的舊自學筆記,因為我很少使用它:為什么 DENSE_RANK 而不是 RANK 或 ROW_NUMBER,以及它是如何做到的實際工作? 在此過程中,我更新了該教程以包含David<\/strong>針對這個特定問題的解決方案,然后認為它可能對 SQL 新手(或像我這樣忘記東西的其他人)有所幫助。

整個教程文本可以復制\/粘貼到查詢編輯器中,然后每個示例查詢可以(單獨)取消注釋並運行,以查看它們各自的結果。 (默認情況下,此問題的解決方案在底部未注釋。)或者,每個示例可以單獨復制到它們自己的查詢編輯實例中,但TBLx<\/strong> CTE 必須包含在每個示例中。

--WITH /* DB2 version */
--TBLx (Col_A, Col_B) AS (VALUES 
--     (  1,     1  ),
--     (  1,     1  ),
--     (  1,     1  ),
--     (  1,     2  ))

WITH /* SQL-Server version */
TBLx    (Col_A, Col_B) AS
  (SELECT  1,     1    UNION ALL
   SELECT  1,     1    UNION ALL
   SELECT  1,     1    UNION ALL
   SELECT  1,     2)

/*** Example-A: demonstrates the difference between ROW_NUMBER, RANK and DENSE_RANK ***/

  --SELECT Col_A, Col_B,
  --  ROW_NUMBER() OVER(PARTITION BY Col_A ORDER BY Col_B) AS ROW_NUMBER_,
  --  RANK() OVER(PARTITION BY Col_A ORDER BY Col_B)       AS RANK_,
  --  DENSE_RANK() OVER(PARTITION BY Col_A ORDER BY Col_B) AS DENSE_RANK_
  --FROM TBLx

  /* RESULTS:    
    Col_A  Col_B  ROW_NUMBER_  RANK_  DENSE_RANK_
      1      1        1          1        1
      1      1        2          1        1
      1      1        3          1        1
      1      2        4          4        2

     ROW_NUMBER: Just increments for the three identical rows and increments again for the final unique row.
                 That is, it’s an order-value (based on "sort" order) but makes no other distinction.
                 
           RANK: Assigns the same rank value to the three identical rows, then jumps to 4 for the fourth row,
                 which is *unique* with regard to the others.
                 That is, each identical row is ranked by the rank-order of the first row-instance of that
                 (identical) value-set.
                 
     DENSE_RANK: Also assigns the same rank value to the three identical rows but the fourth *unique* row is
                 assigned a value of 2.
                 That is, DENSE_RANK identifies that there are (only) two *unique* row-types in the row set.
  */

/*** Example-B: to get only the distinct resulting "count-of-each-row-type" rows ***/

--  SELECT DISTINCT -- For unique returned "count-of-each-row-type" rows, the DISTINCT operator is necessary because
--                  -- the calculated DENSE_RANK value is appended to *all* rows in the data set.  Without DISTINCT,
--                  -- its value for each original-data row-type will just be replicated for each of those rows.
--                  
--    Col_A, Col_B,                
--    DENSE_RANK() OVER(PARTITION BY Col_A ORDER BY Col_B) AS DISTINCT_ROWTYPE_COUNT_
--  FROM TBLx

  /* RESULTS:
    Col_A  Col_B  DISTINCT_ROWTYPE_COUNT_
      1      1            1
      1      2            2
  */

/*** Example-C.1: demonstrates the derivation of the "count-of-all-row-types" (finalized in Example-C.2, below) ***/

--  SELECT
--    Col_A, Col_B,
--    
--    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B ASC) AS ROW_TYPES_COUNT_ASC_,
--    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B DESC) AS ROW_TYPES_COUNT_DESC_,
--    
--    -- Adding the above cases together and subtracting one gives the same total count for on each resulting row:
--    
--    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B ASC)
--       +
--    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B DESC)
--      - 1   /* (Because DENSE_RANK values are one-based) */
--      AS ROW_TYPES_COUNT_
--  FROM TBLx

  /* RESULTS:
    COL_A  COL_B  ROW_TYPES_COUNT_ASC_  ROW_TYPES_COUNT_DESC_  ROW_TYPES_COUNT_
      1      2            2                     1                    2
      1      1            1                     2                    2
      1      1            1                     2                    2
      1      1            1                     2                    2
  */

/*** Example-C.2: uses the above technique to get a *single* resulting "count-of-all-row-types" row ***/

  SELECT DISTINCT -- For a single returned "count-of-all-row-types" row, the DISTINCT operator is necessary because the
                  -- calculated DENSE_RANK value is appended to *all* rows in the data set.  Without DISTINCT, that
                  -- value will just be replicated for each original-data row.
                  
  --  Col_A, Col_B, -- In order to get a *single* returned "count-of-all-row-types" row (and field), all other fields
                    -- must be excluded because their respective differing row-values will defeat the purpose of the
                    -- DISTINCT operator, above.
                   
    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B ASC)
       +
    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B DESC)
      - 1   /* (Because DENSE_RANK values are one-based) */
      AS ROW_TYPES_COUNT_
  FROM TBLx
  
  /* RESULTS:

    ROW_TYPES_COUNT_
          2
  */

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM