简体   繁体   English

基于另一列中的总计的T-SQL不等分十分位数

[英]T-SQL Unequal Decile based on totals in another column

I didn't see an exact answer for what I was looking for. 我没有找到确切的答案。 I have a table with an ID and two values. 我有一个带有ID和两个值的表。 I need to sort the first value column low to high and then decile the list based on each decile having an equal (or almost equal) total value 2. Here's an example using quartiles for space considerations: 我需要将第一个值列从低到高排序,然后基于具有相等(或几乎相等)总值2的每个十分位来决定列表。以下是出于空间考虑而使用四分位的示例:

I have: 我有:

ID  value1  value2      
1     2      132        
2     6      182        
3     5      195        
4     8      152        
5     3      132        
6     9      129        
7     3      180        
8     9      120        
9     3      172        
10    6      192        
11    9      177        
12    12     151        

Each quartile should about about 478.5 每个四分位数应约为478.5

Sorting by value1 gets this but I need to be able to assign my quartile where each is about 478.5. 按value1排序可以得到这个值,但是我需要能够分配我的四分位数,每个四分位数约为478.5。 I have manually entered sample quartiles which may or may not be correct based on the calculations 我已经手动输入了样本四分位数,根据计算结果可能正确也可能不正确

ID  value1  value2  Qtle    
1     2      132      1 
5     3      132      1 
7     3      180      1 
9     3      172      2 
3     5      195      2 
2     6      182      3 
10    6      192      3 
4     8      152      3 
6     9      129      4 
8     9      120      4 
11    9      177      4 
12   12      151      4 

Sorry about the formatting - first post. 抱歉,格式化-第一篇文章。

Edit 1 - I think I might have solved it, although it's probably not as elegant as it could be 编辑1-我想我可能已经解决了,尽管它可能不像它可能的那么优雅

Edit 2 - Added sample quartiles above and fixed the code below to reflect quartiles instead of deciles. 编辑2-在上方添加了样本四分位数,并修复了以下代码以反映四分位数而不是十进制。 Also fixed the sum of value2 还固定了值的总和2

SELECT value1
    ,value2
,SUM(value2) OVER (ORDER BY value1 ) CumSum
,CASE
    WHEN SUM(value2) OVER (ORDER BY value1 ) < (Select sum(value2) from table1)/4 Then 1 
   WHEN SUM(value2) OVER (ORDER BY value1 ) < 2 * (Select sum(value2) from 
table1)/4 Then 2 
    WHEN SUM(value2) OVER (ORDER BY value1 ) < 3 * (Select sum(value2) from 
table1)/4 Then 3 
    Else 4 
 End as Quartile
FROM Table1

I hope I've got this correctly... 我希望我已经正确了...

The following is a generic approach. 以下是通用方法。 You can specify the @TileCount with a variable: 您可以使用变量指定@TileCount

DECLARE @Table1 TABLE(ID INT,value1 INT,value2 INT);
INSERT INTO @Table1 VALUES      
 (1,2,132)        
,(2,6,182)        
,(3,5,195)        
,(4,8,152)        
,(5,3,132)        
,(6,9,129)        
,(7,3,180)        
,(8,9,120)        
,(9,3,172)        
,(10,6,192)        
,(11,9,177)        
,(12,12,151);

DECLARE @TileCount INT=4;

WITH Sums AS
(
    SELECT TOP (@TileCount) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS TileRank
              ,A.SumTotal
              ,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (A.SumTotal / CAST(@TileCount AS FLOAT)) AS SumPart  
    FROM master..spt_values
    CROSS APPLY(SELECT (SELECT SUM(value2) FROM @Table1) AS SumTotal)AS A
)
,AddCumSum AS
(
    SELECT value1
          ,value2
          ,SUM(value2) OVER (ORDER BY value1) CumSum
     FROM @Table1
)
SELECT AddCumSum.*
      ,A.SumPart
      ,A.TileRank AS Tile
FROM AddCumSum
OUTER APPLY(SELECT TOP 1 * FROM Sums WHERE CumSum<=SumPart ORDER BY TileRank ASC) AS A;

The result 结果

+--------+--------+--------+---------+------+
| value1 | value2 | CumSum | SumPart | Tile |
+--------+--------+--------+---------+------+
| 2      | 132    | 132    | 478,5   | 1    |
+--------+--------+--------+---------+------+
| 3      | 132    | 616    | 957     | 2    |
+--------+--------+--------+---------+------+
| 3      | 180    | 616    | 957     | 2    |
+--------+--------+--------+---------+------+
| 3      | 172    | 616    | 957     | 2    |
+--------+--------+--------+---------+------+
| 5      | 195    | 811    | 957     | 2    |
+--------+--------+--------+---------+------+
| 6      | 182    | 1185   | 1435,5  | 3    |
+--------+--------+--------+---------+------+
| 6      | 192    | 1185   | 1435,5  | 3    |
+--------+--------+--------+---------+------+
| 8      | 152    | 1337   | 1435,5  | 3    |
+--------+--------+--------+---------+------+
| 9      | 120    | 1763   | 1914    | 4    |
+--------+--------+--------+---------+------+
| 9      | 129    | 1763   | 1914    | 4    |
+--------+--------+--------+---------+------+
| 9      | 177    | 1763   | 1914    | 4    |
+--------+--------+--------+---------+------+
| 12     | 151    | 1914   | 1914    | 4    |
+--------+--------+--------+---------+------+

Some explanation 一些解释

The CTE Sums computes some values which allows to use them as named variables . CTE Sums计算一些值,这些值允许将它们用作命名变量 The @TileCount is used within the TOP clause in connection with ROW_NUMBER() selecting from master..spt_values . @TileCountTOP子句中使用,与ROW_NUMBER()master..spt_values选择一起使用。 This is nothing else than a well filled table. 这不过是一张装满桌子的桌子。 We are not interested in the values, we just need it as the base to get a running number. 我们对这些值不感兴趣,我们只需要它作为获取运行编号的基础。

The second CTE AddCumSum returns the result with the running summa. 第二个CTE AddCumSum返回具有运行AddCumSum的结果。

The final SELECT finds the smallest TileRank fitting to the running summa. 最终的SELECT查找运行摘要的最小TileRank拟合。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM