簡體   English   中英

Oracle SQL查詢來計算數據集的平均值,不包括異常值

[英]Oracle SQL Query to calculate the average of a data set, excluding outliers

我有一個查詢,其中包含要顯示的正確條件和字段:

  SELECT t.business_process_id,
         COUNT (tsp.status) AS COUNT,
         ROUND (AVG (tsp.end_date - tsp.start_date), 2) * 24 * 60 AS average,
         ROUND (MAX (tsp.end_date - tsp.start_date), 2) * 24 * 60 AS MAX,
         ROUND (MIN (tsp.end_date - tsp.start_date), 2) * 24 * 60 AS MIN,
         ROUND (MEDIAN (tsp.end_date - tsp.start_date), 2) * 24 * 60 AS MEDIAN,
         ROUND (STDDEV (tsp.end_date - tsp.start_date), 2) AS std_deviation
    FROM transaction_status_period tsp, transaction t
   WHERE     t.trans_id = tsp.trans_id
         AND tsp.status = 'R'
         AND tsp.end_date IS NOT NULL
         AND tsp.userid NOT IN ('X', 'Y', 'Z', 'A')
         AND EXTRACT (DAY FROM tsp.start_date) =
                 EXTRACT (DAY FROM tsp.end_date)
         AND EXTRACT (YEAR FROM tsp.start_date) =
                 EXTRACT (YEAR FROM tsp.end_date)
         AND EXTRACT (MONTH FROM tsp.start_date) =
                 EXTRACT (MONTH FROM tsp.end_date)
         AND EXTRACT (YEAR FROM tsp.start_date) = 2013
         AND NOT EXISTS
                     (SELECT 1
                        FROM transaction_status_period tsp1
                       WHERE     tsp1.trans_id = tsp.trans_id
                             AND tsp.userid = tsp1.userid
                             AND tsp1.status = 'S'
                             AND tsp1.timestamp < tsp.timestamp)
GROUP BY t.business_process_id

該查詢計算出的平均值是所涉及的整個數據集(年份= 2013)。 有沒有一種方法可以讓查詢計算出2013年以來所有數據的平均值(不包括異常值)? 即找到(tsp.end_date - tsp.start_date)(tsp.end_date - tsp.start_date)日期差的平均值。

percentile_cont函數可以工作嗎? 我不熟悉它,但我確實知道它可以計算特定列的百分比。 就我而言,我正在尋找(tsp.end_date - tsp.start_date)之間的平均日期差,但是大多數數據點的平均值(不包括異常值)。

任何幫助將非常感激。 也許我以錯誤的方式處理了這個查詢。

這樣的事情可以解決您的問題嗎?

在嵌入式視圖中計算平均值和標准偏差,然后使用該值定義離群值。 假設您認為離群值是均值標准的兩倍,則:

SELECT calc.business_process_id,
 COUNT (calc.status) AS COUNT,
 ROUND (AVG (calc.end_date - calc.start_date), 2) * 24 * 60 AS average,
 ROUND (MAX (calc.end_date - calc.start_date), 2) * 24 * 60 AS MAX,
 ROUND (MIN (calc.end_date - calc.start_date), 2) * 24 * 60 AS MIN,
 ROUND (MEDIAN (calc.end_date - calc.start_date), 2) * 24 * 60 AS MEDIAN,
 ROUND (STDDEV (calc.end_date - calc.start_date), 2) AS std_deviation
FROM (SELECT t.business_process_id,
         tsp.status,
         tsp.start_date,
         tsp.end_date, 
         ntile(100) over (order by (tsp.end_date-tsp.start_date)) as percentiles
      FROM transaction_status_period tsp, transaction t 
      WHERE     t.trans_id = tsp.trans_id
      AND tsp.status = 'R'
      AND tsp.end_date IS NOT NULL
      AND tsp.userid NOT IN ('X', 'Y', 'Z', 'A')
      AND EXTRACT (DAY FROM tsp.start_date) =
          EXTRACT (DAY FROM tsp.end_date)
      AND EXTRACT (YEAR FROM tsp.start_date) =
          EXTRACT (YEAR FROM tsp.end_date)
      AND EXTRACT (MONTH FROM tsp.start_date) =
          EXTRACT (MONTH FROM tsp.end_date)
      AND EXTRACT (YEAR FROM tsp.start_date) = 2013
      AND NOT EXISTS
             (SELECT 1
                FROM transaction_status_period tsp1
               WHERE     tsp1.trans_id = tsp.trans_id
                     AND tsp.userid = tsp1.userid
                     AND tsp1.status = 'S'
                     AND tsp1.timestamp < tsp.timestamp)
  ) calc
WHERE calc.percentiles >=10 
AND calc.percentiles <=90
GROUP BY calc.business_process_id  

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM