[英]Oracle SQL Query to calculate the average of a data set, excluding outliers
我有一個查詢,其中包含要顯示的正確條件和字段:
SELECT t.business_process_id,
COUNT (tsp.status) AS COUNT,
ROUND (AVG (tsp.end_date - tsp.start_date), 2) * 24 * 60 AS average,
ROUND (MAX (tsp.end_date - tsp.start_date), 2) * 24 * 60 AS MAX,
ROUND (MIN (tsp.end_date - tsp.start_date), 2) * 24 * 60 AS MIN,
ROUND (MEDIAN (tsp.end_date - tsp.start_date), 2) * 24 * 60 AS MEDIAN,
ROUND (STDDEV (tsp.end_date - tsp.start_date), 2) AS std_deviation
FROM transaction_status_period tsp, transaction t
WHERE t.trans_id = tsp.trans_id
AND tsp.status = 'R'
AND tsp.end_date IS NOT NULL
AND tsp.userid NOT IN ('X', 'Y', 'Z', 'A')
AND EXTRACT (DAY FROM tsp.start_date) =
EXTRACT (DAY FROM tsp.end_date)
AND EXTRACT (YEAR FROM tsp.start_date) =
EXTRACT (YEAR FROM tsp.end_date)
AND EXTRACT (MONTH FROM tsp.start_date) =
EXTRACT (MONTH FROM tsp.end_date)
AND EXTRACT (YEAR FROM tsp.start_date) = 2013
AND NOT EXISTS
(SELECT 1
FROM transaction_status_period tsp1
WHERE tsp1.trans_id = tsp.trans_id
AND tsp.userid = tsp1.userid
AND tsp1.status = 'S'
AND tsp1.timestamp < tsp.timestamp)
GROUP BY t.business_process_id
該查詢計算出的平均值是所涉及的整個數據集(年份= 2013)。 有沒有一種方法可以讓查詢計算出2013年以來所有數據的平均值(不包括異常值)? 即找到(tsp.end_date - tsp.start_date)
的(tsp.end_date - tsp.start_date)
日期差的平均值。
percentile_cont
函數可以工作嗎? 我不熟悉它,但我確實知道它可以計算特定列的百分比。 就我而言,我正在尋找(tsp.end_date - tsp.start_date)
之間的平均日期差,但是大多數數據點的平均值(不包括異常值)。
任何幫助將非常感激。 也許我以錯誤的方式處理了這個查詢。
這樣的事情可以解決您的問題嗎?
在嵌入式視圖中計算平均值和標准偏差,然后使用該值定義離群值。 假設您認為離群值是均值標准的兩倍,則:
SELECT calc.business_process_id,
COUNT (calc.status) AS COUNT,
ROUND (AVG (calc.end_date - calc.start_date), 2) * 24 * 60 AS average,
ROUND (MAX (calc.end_date - calc.start_date), 2) * 24 * 60 AS MAX,
ROUND (MIN (calc.end_date - calc.start_date), 2) * 24 * 60 AS MIN,
ROUND (MEDIAN (calc.end_date - calc.start_date), 2) * 24 * 60 AS MEDIAN,
ROUND (STDDEV (calc.end_date - calc.start_date), 2) AS std_deviation
FROM (SELECT t.business_process_id,
tsp.status,
tsp.start_date,
tsp.end_date,
ntile(100) over (order by (tsp.end_date-tsp.start_date)) as percentiles
FROM transaction_status_period tsp, transaction t
WHERE t.trans_id = tsp.trans_id
AND tsp.status = 'R'
AND tsp.end_date IS NOT NULL
AND tsp.userid NOT IN ('X', 'Y', 'Z', 'A')
AND EXTRACT (DAY FROM tsp.start_date) =
EXTRACT (DAY FROM tsp.end_date)
AND EXTRACT (YEAR FROM tsp.start_date) =
EXTRACT (YEAR FROM tsp.end_date)
AND EXTRACT (MONTH FROM tsp.start_date) =
EXTRACT (MONTH FROM tsp.end_date)
AND EXTRACT (YEAR FROM tsp.start_date) = 2013
AND NOT EXISTS
(SELECT 1
FROM transaction_status_period tsp1
WHERE tsp1.trans_id = tsp.trans_id
AND tsp.userid = tsp1.userid
AND tsp1.status = 'S'
AND tsp1.timestamp < tsp.timestamp)
) calc
WHERE calc.percentiles >=10
AND calc.percentiles <=90
GROUP BY calc.business_process_id
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.