[英]Get DISTINCT COUNT in one pass in SQL Server
我有一個如下表:
Region Country Manufacturer Brand Period Spend
R1 C1 M1 B1 2016 5
R1 C1 M1 B1 2017 10
R1 C1 M1 B1 2017 20
R1 C1 M1 B2 2016 15
R1 C1 M1 B3 2017 20
R1 C2 M1 B1 2017 5
R1 C2 M2 B4 2017 25
R1 C2 M2 B5 2017 30
R2 C3 M1 B1 2017 35
R2 C3 M2 B4 2017 40
R2 C3 M2 B5 2017 45
...
我在下面編寫了查詢以匯總它們:
SELECT [Region]
,[Country]
,[Manufacturer]
,[Brand]
,Period
,SUM([Spend]) AS [Spend]
FROM myTable
GROUP BY [Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]
ORDER BY 1,2,3,4
其結果如下:
Region Country Manufacturer Brand Period Spend
R1 C1 M1 B1 2016 5
R1 C1 M1 B1 2017 30 -- this row is an aggregate from raw table above
R1 C1 M1 B2 2016 15
R1 C1 M1 B3 2017 20
R1 C2 M1 B1 2017 4 -- aggregated result
R1 C2 M2 B4 2017 25
R1 C2 M2 B5 2017 30
R2 C3 M2 B4 2017 40
R2 C3 M2 B5 2017 45
我想另一列添加到上面的表中顯示了DISTINCT COUNT
的Brand
通過分組Region
, Country
, Manufacturer
和Period
。 因此,最終表將如下所示:
Region Country Manufacturer Brand Period Spend UniqBrandCount
R1 C1 M1 B1 2016 5 2 -- two brands by R1, C1, M1 in 2016
R1 C1 M1 B1 2017 30 1
R1 C1 M1 B2 2016 15 2 -- same as first row's result
R1 C1 M1 B3 2017 20 1
R1 C2 M1 B1 2017 4 1
R1 C2 M2 B4 2017 25 2
R1 C2 M2 B5 2017 30 2
R2 C3 M2 B4 2017 40 2
R2 C3 M2 B5 2017 45 2
我知道如何通過三個步驟獲得最終結果。
運行以下查詢(查詢1):
從myTable GROUP中按[地區],[國家/地區],[制造商],[期間],將[地區],[國家/地區],[制造商],[期間],COUNT(DISTINCT [品牌])作為[品牌數]轉換為Temp1
運行此查詢(查詢2)
SELECT [地區],[國家],[制造商],[品牌],年([期間])AS期間,SUM([支出])AS [支出]從myTable GROUP BY中按[區域],[國家]進入Temp2 [制造商],[品牌],[期間]
然后LEFT JOIN
Temp2
和Temp1
,從后者引入[BrandCount]
,如下所示:
從Temp2中選擇a。*,b。*作為左連接Temp1 AS作為b在a。[Region] = b。[Region] and a。[Country] = b。[Country] AND a。[Advertiser] = b上。 [Advertiser] AND a。[Period] = b。[Period]
我敢肯定,有一種更有效的方法可以做到這一點,對嗎? 預先感謝您的建議/答案!
您問題的標簽;
窗口功能
建議您有個不錯的主意。
對於按地區,國家,制造商和時期分組的品牌識別數量 :您可以輸入:
Select Region
,Country
,Manufacturer
,Brand
,Period
,Spend
,DENSE_RANK() Over (Partition By Region, Country, Manufacturer, Period Order By Brand asc)
+ DENSE_RANK() Over (Partition By Region, Country, Manufacturer, Period Order By Brand desc)
-1 UniqBrandCount
From myTable T1
Order By 1,2,3,4
從這個問題中大量借用: https : //dba.stackexchange.com/questions/89031/using-distinct-in-window-function-with-over
Count Distinct不起作用,因此需要density_rank。 按正向和反向順序對品牌進行排名,然后再減去1即可得出不同的計數。
您的sum函數也可以使用PARTITION BY
邏輯進行重寫。 這樣,您可以為每個聚合使用不同的分組級別:
SELECT
[Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]
,dense_rank() OVER
(PARTITION BY
[Region]
,[Country]
,[Manufacturer]
,[Period] Order by Brand)
+ dense_rank() OVER
(PARTITION BY
[Region]
,[Country]
,[Manufacturer]
,[Period] Order by Brand Desc)
- 1
AS [BrandCount]
,SUM([Spend]) OVER
(PARTITION BY
[Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]) as [Spend]
from
myTable
ORDER BY 1,2,3,4
然后,您可能需要減少輸出中的行數,因為此語法給出的行數與myTable相同,但是聚合總計出現在它們適用的每一行上:
R1 C1 M1 B1 2016 2 5
R1 C1 M1 B1 2017 2 30 --dup1
R1 C1 M1 B1 2017 2 30 --dup1
R1 C1 M1 B2 2016 2 15
R1 C1 M1 B3 2017 2 20
R1 C2 M1 B1 2017 1 5
R1 C2 M2 B4 2017 2 25
R1 C2 M2 B5 2017 2 30
R2 C3 M1 B1 2017 1 35
R2 C3 M2 B4 2017 2 40
R2 C3 M2 B5 2017 2 45
從此輸出中選擇不同的行即可滿足您的需求。
考慮以下數據:
Col1 Col2
B 1
B 1
B 3
B 5
B 7
B 9
density_rank()根據當前項之前的不同項的數量加1對數據進行排名。
1-> 1、3-> 2、5-> 3、7-> 4、9-> 5。
以相反的順序(使用desc
)產生相反的模式:
1-> 5、3-> 4、5-> 3、7-> 2、9-> 1:
將這些等級加在一起得出相同的值:
1 + 5 = 2 + 4 = 3 + 3 = 4 + 2 = 5 + 1 = 6
這里的措辭很有幫助,
(number of distinct items before + 1) + (number of distinct items after + 1)
= number of distinct OTHER items before AND after + 2
= Total number of distinct items + 1
因此,要獲得不同項的總數,請將ascending
和descending
density_ranks加在一起,然后減去1。
double dense_rank
想法意味着您需要兩種排序方式(假設不存在提供排序順序的索引)。 假設沒有NULL
品牌(就像這個想法一樣),您可以使用單個dense_rank
和窗口MAX
,如下所示( 演示 )
WITH T1
AS (SELECT *,
DENSE_RANK() OVER (PARTITION BY [Region], [Country], [Manufacturer], [Period] ORDER BY Brand) AS [dr]
FROM myTable),
T2
AS (SELECT *,
MAX([dr]) OVER (PARTITION BY [Region], [Country], [Manufacturer], [Period]) AS UniqBrandCount
FROM T1)
SELECT [Region],
[Country],
[Manufacturer],
[Brand],
Period,
SUM([Spend]) AS [Spend],
MAX(UniqBrandCount) AS UniqBrandCount
FROM T2
GROUP BY [Region],
[Country],
[Manufacturer],
[Brand],
[Period]
ORDER BY [Region],
[Country],
[Manufacturer],
[Period],
Brand
上面有一些不可避免的假脫機(不可能以100%流式傳輸的方式進行),而是一種。
奇怪的是,需要最后的order by子句將排序數減少到一(如果有合適的索引,則為零)。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.