簡體   English   中英

如何獲得每條記錄的中位數?

[英]how to get median for every record?

sql server中沒有中值函數,所以我使用了這個很棒的建議:

https://stackoverflow.com/a/2026609/117700

這將計算整個數據集的中位數,但我需要每條記錄的中位數。

我的數據集是:

+-----------+-------------+
| client_id | TimesTested |
+-----------+-------------+
|    214220 |           1 |
|    215425 |           1 |
|    212839 |           4 |
|    215249 |           1 |
|    210498 |           3 |
|    110655 |           1 |
|    110655 |           1 |
|    110655 |          12 |
|    215425 |           4 |
|    100196 |           1 |
|    110032 |           1 |
|    110032 |           1 |
|    101944 |           3 |
|    101232 |           2 |
|    101232 |           1 |
+-----------+-------------+

這是我正在使用的查詢:

select client_id,  
    (
    SELECT
    (
     (SELECT MAX(TimesTested ) FROM
       (SELECT TOP 50 PERCENT t.TimesTested 
       FROM counted3 t 
       where t.timestested>1 
       and CLIENT_ID=t.CLIENT_ID 
       ORDER BY t.TimesTested ) AS BottomHalf)
     +
     (SELECT MIN(TimesTested ) FROM
       (SELECT TOP 50 PERCENT t.TimesTested 
       FROM counted3 t 
       where t.timestested>1 
       and CLIENT_ID=t.CLIENT_ID 
       ORDER BY t.TimesTested DESC) AS TopHalf)
    ) / 2 AS Median
    ) TotalAvgTestFreq
from counted3 

group by client_id

但是它提供了我有趣的數據:

+-----------+------------------+
| client_id | median???????????|
+-----------+------------------+
|    100007 |               84 |
|    100008 |               84 |
|    100011 |               84 |
|    100014 |               84 |
|    100026 |               84 |
|    100027 |               84 |
|    100028 |               84 |
|    100029 |               84 |
|    100042 |               84 |
|    100043 |               84 |
|    100071 |               84 |
|    100072 |               84 |
|    100074 |               84 |
+-----------+------------------+

我可以獲取每個client_id的中位數嗎?

我目前正在嘗試使用Aaron網站上的這個很棒的查詢:

select c3.client_id,(
    SELECT AVG(1.0 * TimesTested ) median
    FROM
    (
        SELECT o.TimesTested , 
        rn = ROW_NUMBER() OVER (ORDER BY o.TimesTested ), c.c
        FROM counted3 AS o
        CROSS JOIN (SELECT c = COUNT(*) FROM counted3) AS c
        where count>1
    ) AS x
    WHERE rn IN ((c + 1)/2, (c + 2)/2)
    ) a
    from counted3 c3
    group by c3.client_id

不幸的是,正如Richardthekiwi指出的那樣:

這是一個單一的中位數,而這個問題是關於每個分區的中位數

我想知道如何在counted3上加入它以獲得每個分區的中位數?>

嘗試這個:

select client_id,  
    (
    SELECT
    (
     (SELECT MAX(testfreq) FROM
       (SELECT TOP 50 PERCENT t.testfreq 
       FROM counted3 t 
       where t.timestested>1 
       and c3.CLIENT_ID=t.CLIENT_ID 
       ORDER BY t.testfreq) AS BottomHalf)
     +
     (SELECT MIN(testfreq) FROM
       (SELECT TOP 50 PERCENT t.testfreq 
       FROM counted3 t 
       where t.timestested>1 
       and c3.CLIENT_ID=t.CLIENT_ID 
       ORDER BY t.testfreq DESC) AS TopHalf)
    ) / 2 AS Median
    ) TotalAvgTestFreq
from counted3 c3

group by client_id

我將c3別名添加到外部CLIENT_ID引用和外部表中。

注意:如果testFreq是intbigint類型,則需要在取平均值之前對其進行CAST,否則將獲得整數除法,例如(2+5)/2 => 3如果2和5是中位數記錄-例如AVG(Cast(testfreq as float))

select client_id, avg(testfreq) median_testfreq
from
(
    select client_id,
           testfreq,
           rn=row_number() over (partition by CLIENT_ID
                                 order by testfreq),
           c=count(testfreq) over (partition by CLIENT_ID)
    from tbk
    where timestested>1
) g
where rn in (round(c/2,0),c/2+1)
group by client_id;

中位數可以作為奇數行中的中心記錄找到,也可以作為偶數行中兩個中心記錄的平均值找到。 這是通過條件rn in (round(c/2,0),c/2+1) ,它選擇所需的一個或兩個記錄。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM