简体   繁体   中英

Oracle: Need to calculate rolling average for past 3 months where we have more than one submission per month

I've seen many examples of rolling averages in oracle but done do quite what I desire.

This is my raw data

DATE            SCORE   AREA
----------------------------
01-JUL-14       60      A
01-AUG-14       45      A
01-SEP-14       45      A
02-SEP-14       50      A
01-OCT-14       30      A
02-OCT-14       45      A
03-OCT-14       50      A
01-JUL-14       60      B
01-AUG-14       45      B
01-SEP-14       45      B
02-SEP-14       50      B
01-OCT-14       30      B
02-OCT-14       45      B
03-OCT-14       50      B

This is the desired result for my rolling average

MMYY        AVG     AREA
-------------------------
JUL-14      60      A
AUG-14      52.5    A
SEP-14      50      A
OCT-14      44      A
JUL-14      60      B
AUG-14      52.5    B
SEP-14      50      B
OCT-14      44      B

The way I need it to work is that for each MMYY, I need to look back 3 months, and AVG the scores per dept. So for example,

For Area A in OCT, in the last 3 months from oct, there were 6 studies, (45+45+50+30+45+50)/6 = 44.1

Normally I would write the query like so

SELECT
  AREA, 
  TO_CHAR(T.DT,'MMYY') MMYY,
  ROUND(AVG(SCORE)
    OVER (PARTITION BY AREA ORDER BY TO_CHAR(T.DT,'MMYY') ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),1)
    AS AVG 
    FROM T

This will look over the last 3 enteries not the last 3 months

One way to do this is to mix aggregation functions with analytic functions. The key idea for average is to avoid using avg() and instead do a sum() divided by a count(*) .

  SELECT AREA, TO_CHAR(T.DT, 'MMYY') AS MMYY,
         SUM(SCORE) / COUNT(*) as AvgScore,
         SUM(SUM(SCORE)) OVER (PARTITION BY AREA ORDER BY MAX(T.DT) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) / SUM(COUNT(*)) OVER (PARTITION BY AREA ORDER BY MAX(T.DT) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
  FROM t
  GROUP BY AREA, TO_CHAR(T.DT, 'MMYY') ;

Note the order by clause. If your data spans years, then using the MMYY format poses problems. It is better to use a format such as YYYY-MM for months, because the alphabetical ordering is the same as the natural ordering.

You can specify also ranges, not only rows.

SELECT
  AREA, 
  TO_CHAR(T.DT,'MMYY') MMYY,
  ROUND(AVG(SCORE)
    OVER (PARTITION BY AREA 
      ORDER BY DT RANGE BETWEEN INTERVAL '3' MONTH PRECEDING AND CURRENT ROW))
    AS AVG 
    FROM T

Since CURRENT ROW is the default, just ORDER BY DT RANGE INTERVAL '3' MONTH PRECEDING should work as well. Perhaps you have to do some fine-tuning, I did not test the behaviour regarding the 28/29/30/31 days per month issue.

Check the Oracle Windowing Clause for further details.

SQL> WITH DATA AS(
  2  SELECT to_date('01-JUL-14','DD-MON-RR')  dt,     60   score,    'A' area FROM dual UNION ALL
  3  SELECT to_date('01-AUG-14','DD-MON-RR')  dt,       45      score,    'A' area FROM dual UNION ALL
  4  SELECT to_date('01-SEP-14','DD-MON-RR')  dt,       45      score,    'A' area FROM dual UNION ALL
  5  SELECT to_date('02-SEP-14','DD-MON-RR')  dt,       50      score,    'A' area FROM dual UNION ALL
  6  SELECT to_date('01-OCT-14','DD-MON-RR')  dt,       30      score,    'A' area FROM dual UNION ALL
  7  SELECT to_date('02-OCT-14','DD-MON-RR')  dt,       45      score,    'A' area FROM dual UNION ALL
  8  SELECT to_date('03-OCT-14','DD-MON-RR')  dt,      50      score,    'A' area FROM dual UNION ALL
  9  SELECT to_date('01-JUL-14','DD-MON-RR')  dt,       60      score,    'B' area FROM dual UNION ALL
 10  SELECT to_date('01-AUG-14','DD-MON-RR')  dt,       45      score,    'B' area FROM dual UNION ALL
 11  SELECT to_date('01-SEP-14','DD-MON-RR')  dt,       45      score,    'B' area FROM dual UNION ALL
 12  SELECT to_date('02-SEP-14','DD-MON-RR')  dt,       50      score,    'B' area FROM dual UNION ALL
 13  SELECT to_date('01-OCT-14','DD-MON-RR')  dt,       30      score,    'B' area FROM dual UNION ALL
 14  SELECT to_date('02-OCT-14','DD-MON-RR')  dt,       45      score,    'B' area FROM dual UNION ALL
 15  SELECT to_date('03-OCT-14','DD-MON-RR')  dt,       50      score,    'B' area FROM dual)
 16  SELECT   TO_CHAR(T.DT, 'MON-RR') AS MMYY,
 17           round(
 18           SUM(SUM(SCORE)) OVER (PARTITION BY AREA ORDER BY MAX(T.DT) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)/
 19           SUM(COUNT(*)) OVER (PARTITION BY AREA ORDER BY MAX(T.DT) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),1)
 20           AS avg_score,
 21           AREA
 22    FROM data t
 23    GROUP BY AREA, TO_CHAR(T.DT, 'MON-RR')
 24  /

MMYY    AVG_SCORE A
------ ---------- -
JUL-14         60 A
AUG-14       52.5 A
SEP-14         50 A
OCT-14       44.2 A
JUL-14         60 B
AUG-14       52.5 B
SEP-14         50 B
OCT-14       44.2 B

8 rows selected.

SQL>

From next time, I would expect you to provide the create and insert statements so that we don't have to spend time on preparing a test case .

And, why YY format? Haven't you seen the Y2K bug? Please use YYYY format.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM