簡體   English   中英

在SQL Server中一次性獲得DISTINCT COUNT

[英]Get DISTINCT COUNT in one pass in SQL Server

我有一個如下表:

Region    Country    Manufacturer    Brand    Period    Spend
R1        C1         M1              B1       2016      5
R1        C1         M1              B1       2017      10
R1        C1         M1              B1       2017      20
R1        C1         M1              B2       2016      15
R1        C1         M1              B3       2017      20
R1        C2         M1              B1       2017      5
R1        C2         M2              B4       2017      25
R1        C2         M2              B5       2017      30
R2        C3         M1              B1       2017      35
R2        C3         M2              B4       2017      40
R2        C3         M2              B5       2017      45
...

我在下面編寫了查詢以匯總它們:

SELECT [Region]
    ,[Country]
    ,[Manufacturer]
    ,[Brand]
    ,Period
    ,SUM([Spend]) AS [Spend]
FROM myTable
GROUP BY [Region]
    ,[Country]
    ,[Manufacturer]
    ,[Brand]
    ,[Period]
ORDER BY 1,2,3,4

其結果如下:

Region    Country    Manufacturer    Brand    Period    Spend
R1        C1         M1              B1       2016      5
R1        C1         M1              B1       2017      30 -- this row is an aggregate from raw table above
R1        C1         M1              B2       2016      15
R1        C1         M1              B3       2017      20
R1        C2         M1              B1       2017      4  -- aggregated result
R1        C2         M2              B4       2017      25
R1        C2         M2              B5       2017      30
R2        C3         M2              B4       2017      40
R2        C3         M2              B5       2017      45

我想另一列添加到上面的表中顯示了DISTINCT COUNTBrand通過分組RegionCountryManufacturerPeriod 因此,最終表將如下所示:

Region    Country    Manufacturer    Brand    Period    Spend    UniqBrandCount
R1        C1         M1              B1       2016      5        2 -- two brands by R1, C1, M1 in 2016
R1        C1         M1              B1       2017      30       1
R1        C1         M1              B2       2016      15       2 -- same as first row's result
R1        C1         M1              B3       2017      20       1
R1        C2         M1              B1       2017      4        1
R1        C2         M2              B4       2017      25       2
R1        C2         M2              B5       2017      30       2
R2        C3         M2              B4       2017      40       2
R2        C3         M2              B5       2017      45       2

我知道如何通過三個步驟獲得最終結果。

  1. 運行以下查詢(查詢1):

    從myTable GROUP中按[地區],[國家/地區],[制造商],[期間],將[地區],[國家/地區],[制造商],[期間],COUNT(DISTINCT [品牌])作為[品牌數]轉換為Temp1

  2. 運行此查詢(查詢2)

    SELECT [地區],[國家],[制造商],[品牌],年([期間])AS期間,SUM([支出])AS [支出]從myTable GROUP BY中按[區域],[國家]進入Temp2 [制造商],[品牌],[期間]

  3. 然后LEFT JOIN Temp2Temp1 ,從后者引入[BrandCount] ,如下所示:

    從Temp2中選擇a。*,b。*作為左連接Temp1 AS作為b在a。[Region] = b。[Region] and a。[Country] = b。[Country] AND a。[Advertiser] = b上。 [Advertiser] AND a。[Period] = b。[Period]

我敢肯定,有一種更有效的方法可以做到這一點,對嗎? 預先感謝您的建議/答案!

您問題的標簽;

窗口功能

建議您有個不錯的主意。

對於按地區,國家,制造商和時期分組的品牌識別數量 :您可以輸入:

Select   Region 
        ,Country
        ,Manufacturer
        ,Brand
        ,Period
        ,Spend
        ,DENSE_RANK() Over (Partition By Region, Country, Manufacturer, Period Order By Brand asc) 
         + DENSE_RANK() Over (Partition By Region, Country, Manufacturer, Period Order By Brand desc) 
         -1 UniqBrandCount
From myTable T1
Order By 1,2,3,4

從這個問題中大量借用: https//dba.stackexchange.com/questions/89031/using-distinct-in-window-function-with-over

Count Distinct不起作用,因此需要density_rank。 按正向和反向順序對品牌進行排名,然后再減去1即可得出不同的計數。

您的sum函數也可以使用PARTITION BY邏輯進行重寫。 這樣,您可以為每個聚合使用不同的分組級別:

SELECT 
[Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]
,dense_rank() OVER 
    (PARTITION BY 
     [Region] 
    ,[Country]
    ,[Manufacturer]
    ,[Period] Order by Brand) 
+ dense_rank() OVER 
    (PARTITION BY 
     [Region] 
    ,[Country]
    ,[Manufacturer]
    ,[Period] Order by Brand Desc) 
- 1  
AS [BrandCount]
,SUM([Spend]) OVER
    (PARTITION BY
     [Region] 
    ,[Country]
    ,[Manufacturer]
    ,[Brand]
    ,[Period]) as [Spend]
from
myTable
ORDER BY 1,2,3,4

然后,您可能需要減少輸出中的行數,因為此語法給出的行數與myTable相同,但是聚合總計出現在它們適用的每一行上:

R1  C1  M1  B1  2016    2   5
R1  C1  M1  B1  2017    2   30 --dup1
R1  C1  M1  B1  2017    2   30 --dup1
R1  C1  M1  B2  2016    2   15
R1  C1  M1  B3  2017    2   20
R1  C2  M1  B1  2017    1   5
R1  C2  M2  B4  2017    2   25
R1  C2  M2  B5  2017    2   30
R2  C3  M1  B1  2017    1   35
R2  C3  M2  B4  2017    2   40
R2  C3  M2  B5  2017    2   45

從此輸出中選擇不同的行即可滿足您的需求。

density_rank技巧如何工作

考慮以下數據:

Col1    Col2
B       1
B       1
B       3
B       5
B       7
B       9

density_rank()根據當前項之前的不同項的數量加1對數據進行排名。

1-> 1、3-> 2、5-> 3、7-> 4、9-> 5。

以相反的順序(使用desc )產生相反的模式:

1-> 5、3-> 4、5-> 3、7-> 2、9-> 1:

將這些等級加在一起得出相同的值:

1 + 5 = 2 + 4 = 3 + 3 = 4 + 2 = 5 + 1 = 6

這里的措辭很有幫助,

(number of distinct items before + 1) + (number of distinct items after + 1) 
= number of distinct OTHER items before AND after + 2 
= Total number of distinct items + 1

因此,要獲得不同項的總數,請將ascendingdescending density_ranks加在一起,然后減去1。

double dense_rank想法意味着您需要兩種排序方式(假設不存在提供排序順序的索引)。 假設沒有NULL品牌(就像這個想法一樣),您可以使用單個dense_rank和窗口MAX ,如下所示( 演示

WITH T1
     AS (SELECT *,
                DENSE_RANK() OVER (PARTITION BY [Region], [Country], [Manufacturer], [Period] ORDER BY Brand) AS [dr]
         FROM   myTable),
     T2
     AS (SELECT *,
                MAX([dr]) OVER (PARTITION BY [Region], [Country], [Manufacturer], [Period]) AS UniqBrandCount
         FROM   T1)
SELECT [Region],
       [Country],
       [Manufacturer],
       [Brand],
       Period,
       SUM([Spend])        AS [Spend],
       MAX(UniqBrandCount) AS UniqBrandCount
FROM   T2
GROUP  BY [Region],
          [Country],
          [Manufacturer],
          [Brand],
          [Period]
ORDER  BY [Region],
          [Country],
          [Manufacturer],
          [Period],
          Brand 

上面有一些不可避免的假脫機(不可能以100%流式傳輸的方式進行),而是一種。

奇怪的是,需要最后的order by子句將排序數減少到一(如果有合適的索引,則為零)。

在此處輸入圖片說明

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM