簡體   English   中英

SQL移動平均

[英]SQL moving average

如何在SQL中創建移動平均線?

當前表:

Date             Clicks 
2012-05-01       2,230
2012-05-02       3,150
2012-05-03       5,520
2012-05-04       1,330
2012-05-05       2,260
2012-05-06       3,540
2012-05-07       2,330

所需的表或輸出:

Date             Clicks    3 day Moving Average
2012-05-01       2,230
2012-05-02       3,150
2012-05-03       5,520          4,360
2012-05-04       1,330          3,330
2012-05-05       2,260          3,120
2012-05-06       3,540          3,320
2012-05-07       2,330          3,010

這是一個常綠喬·塞爾科的問題。 我忽略了使用哪個DBMS平台。 但是無論如何,Joe能夠在10多年前使用標准SQL進行回答。

Joe Celko SQL Puzzles and Answers引用:“最后一次更新嘗試表明我們可以使用謂詞構造一個查詢,該查詢將為我們提供移動平均值:”

SELECT S1.sample_time, AVG(S2.load) AS avg_prev_hour_load
FROM Samples AS S1, Samples AS S2
WHERE S2.sample_time
BETWEEN (S1.sample_time - INTERVAL 1 HOUR)
AND S1.sample_time
GROUP BY S1.sample_time;

額外的列或查詢方法更好嗎? 該查詢在技術上更好,因為UPDATE方法將使數據庫非規范化。 但是,如果所記錄的歷史數據不會改變,並且計算移動平均值非常昂貴,則可以考慮使用列方法。

MS SQL示例:

CREATE TABLE #TestDW
( Date1 datetime,
  LoadValue Numeric(13,6)
);

INSERT INTO #TestDW VALUES('2012-06-09' , '3.540' );
INSERT INTO #TestDW VALUES('2012-06-08' , '2.260' );
INSERT INTO #TestDW VALUES('2012-06-07' , '1.330' );
INSERT INTO #TestDW VALUES('2012-06-06' , '5.520' );
INSERT INTO #TestDW VALUES('2012-06-05' , '3.150' );
INSERT INTO #TestDW VALUES('2012-06-04' , '2.230' );

SQL拼圖查詢:

SELECT S1.date1,  AVG(S2.LoadValue) AS avg_prev_3_days
FROM #TestDW AS S1, #TestDW AS S2
WHERE S2.date1
    BETWEEN DATEADD(d, -2, S1.date1 )
    AND S1.date1
GROUP BY S1.date1
order by 1;

一種方法是在同一張桌子上連接幾次。

select
 (Current.Clicks 
  + isnull(P1.Clicks, 0)
  + isnull(P2.Clicks, 0)
  + isnull(P3.Clicks, 0)) / 4 as MovingAvg3
from
 MyTable as Current
 left join MyTable as P1 on P1.Date = DateAdd(day, -1, Current.Date)
 left join MyTable as P2 on P2.Date = DateAdd(day, -2, Current.Date)
 left join MyTable as P3 on P3.Date = DateAdd(day, -3, Current.Date)

調整ON-Clauses的DateAdd組件以匹配您希望移動平均線嚴格是從過去到現在還是從前幾天到未來幾天。

  • 對於僅需要幾個數據點的移動平均值的情況,這非常有用。
  • 對於具有多個數據點的移動平均值,這不是最佳解決方案。
select t2.date, round(sum(ct.clicks)/3) as avg_clicks
from
(select date from clickstable) as t2,
(select date, clicks from clickstable) as ct
where datediff(t2.date, ct.date) between 0 and 2
group by t2.date

這里的例子。

顯然,您可以將間隔更改為所需的任何時間。 您也可以使用count()而不是幻數來簡化更改,但這也會減慢速度。

select *
        , (select avg(c2.clicks) from #clicks_table c2 
            where c2.date between dateadd(dd, -2, c1.date) and c1.date) mov_avg
from #clicks_table c1

假設x是要求平均值的值,而xDate是日期值:

從myTable在dateadd(d,-2,xDate)和xDate之間的xDate中選擇avg(x)

使用不同的連接謂詞:

SELECT current.date
       ,avg(periods.clicks)
FROM current left outer join current as periods
       ON current.date BETWEEN dateadd(d,-2, periods.date) AND periods.date
GROUP BY current.date HAVING COUNT(*) >= 3

hading語句將防止返回不包含至少N個值的任何日期。

滾動平均值的通用模板,適用於大型數據集

WITH moving_avg AS (
  SELECT 0 AS [lag] UNION ALL
  SELECT 1 AS [lag] UNION ALL
  SELECT 2 AS [lag] UNION ALL
  SELECT 3 AS [lag] --ETC
)
SELECT
  DATEADD(day,[lag],[date]) AS [reference_date],
  [otherkey1],[otherkey2],[otherkey3],
  AVG([value1]) AS [avg_value1],
  AVG([value2]) AS [avg_value2]
FROM [data_table]
CROSS JOIN moving_avg
GROUP BY [otherkey1],[otherkey2],[otherkey3],DATEADD(day,[lag],[date])
ORDER BY [otherkey1],[otherkey2],[otherkey3],[reference_date];

對於加權滾動平均值:

WITH weighted_avg AS (
  SELECT 0 AS [lag], 1.0 AS [weight] UNION ALL
  SELECT 1 AS [lag], 0.6 AS [weight] UNION ALL
  SELECT 2 AS [lag], 0.3 AS [weight] UNION ALL
  SELECT 3 AS [lag], 0.1 AS [weight] --ETC
)
SELECT
  DATEADD(day,[lag],[date]) AS [reference_date],
  [otherkey1],[otherkey2],[otherkey3],
  AVG([value1] * [weight]) / AVG([weight]) AS [wavg_value1],
  AVG([value2] * [weight]) / AVG([weight]) AS [wavg_value2]
FROM [data_table]
CROSS JOIN weighted_avg
GROUP BY [otherkey1],[otherkey2],[otherkey3],DATEADD(day,[lag],[date])
ORDER BY [otherkey1],[otherkey2],[otherkey3],[reference_date];

為此,我想創建一個輔助/尺寸日期表,例如

create table date_dim(date date, date_1 date, dates_2 date, dates_3 dates ...)

date是關鍵, date_1是這一天, date_2包含這一天和前一天; date_3 ...

然后,您可以在配置單元中進行均等連接。

使用如下視圖:

select date, date               from date_dim
union all
select date, date_add(date, -1) from date_dim
union all
select date, date_add(date, -2) from date_dim
union all
select date, date_add(date, -3) from date_dim

注意:這不是答案,而是Diego Scaravaggi 答案的增強代碼示例。 由於評論部分不足,我將其發布為答案。 請注意,我已經參數化了移動平均時間段。

declare @p int = 3
declare @t table(d int, bal float)
insert into @t values
(1,94),
(2,99),
(3,76),
(4,74),
(5,48),
(6,55),
(7,90),
(8,77),
(9,16),
(10,19),
(11,66),
(12,47)

select a.d, avg(b.bal)
from
       @t a
       left join @t b on b.d between a.d-(@p-1) and a.d
group by a.d
--@p1 is period of moving average, @01 is offset

declare @p1 as int
declare @o1 as int
set @p1 = 5;
set @o1 = 3;

with np as(
select *, rank() over(partition by cmdty, tenor order by markdt) as r
from p_prices p1
where
1=1 
)
, x1 as (
select s1.*, avg(s2.val) as avgval from np s1
inner join np s2 
on s1.cmdty = s2.cmdty and s1.tenor = s2.tenor
and s2.r between s1.r - (@p1 - 1) - (@o1) and s1.r - (@o1)
group by s1.cmdty, s1.tenor, s1.markdt, s1.val, s1.r
)

我不確定您的預期結果(輸出)是否會顯示3天的經典“簡單移動(滾動)平均值”。 因為例如,按定義,數字的前三位給出:

ThreeDaysMovingAverage = (2.230 + 3.150 + 5.520) / 3 = 3.6333333

但您希望達到4.360 ,這令人困惑。

不過,我建議以下使用窗口功能AVG解決方案。 這種方法比其他答案中引入的SELF-JOIN效率更高(清晰,資源占用更少)(我很驚訝,沒有人提供更好的解決方案)。

-- Oracle-SQL dialect 
with
  data_table as (
     select date '2012-05-01' AS dt, 2.230 AS clicks from dual union all
     select date '2012-05-02' AS dt, 3.150 AS clicks from dual union all
     select date '2012-05-03' AS dt, 5.520 AS clicks from dual union all
     select date '2012-05-04' AS dt, 1.330 AS clicks from dual union all
     select date '2012-05-05' AS dt, 2.260 AS clicks from dual union all
     select date '2012-05-06' AS dt, 3.540 AS clicks from dual union all
     select date '2012-05-07' AS dt, 2.330 AS clicks from dual  
  ),
  param as (select 3 days from dual)
select
   dt     AS "Date",
   clicks AS "Clicks",

   case when rownum >= p.days then 
       avg(clicks) over (order by dt
                          rows between p.days - 1 preceding and current row)
   end    
          AS "3 day Moving Average"
from data_table t, param p;

您會看到, case when rownum >= p.days thenAVG將用case when rownum >= p.days then包裝, case when rownum >= p.days then在第一行中強制使用NULL ,其中“ 3天移動平均值”是沒有意義的。

在蜂巢中,也許您可​​以嘗試

select date, clicks, avg(clicks) over (order by date rows between 2 preceding and current row) as moving_avg from clicktable;

我們可以應用Joe Celko的“臟”左外部聯接方法(如Diego Scaravaggi上文所述)來回答所提出的問題。

declare @ClicksTable table  ([Date] date, Clicks int)
insert into @ClicksTable
    select '2012-05-01', 2230 union all
    select '2012-05-02', 3150 union all
    select '2012-05-03', 5520 union all
    select '2012-05-04', 1330 union all
    select '2012-05-05', 2260 union all
    select '2012-05-06', 3540 union all
    select '2012-05-07', 2330

該查詢:

SELECT
    T1.[Date],
    T1.Clicks,
    -- AVG ignores NULL values so we have to explicitly NULLify
    -- the days when we don't have a full 3-day sample
    CASE WHEN count(T2.[Date]) < 3 THEN NULL
        ELSE AVG(T2.Clicks) 
    END AS [3-Day Moving Average] 
FROM @ClicksTable T1
LEFT OUTER JOIN @ClicksTable T2
    ON T2.[Date] BETWEEN DATEADD(d, -2, T1.[Date]) AND T1.[Date]
GROUP BY T1.[Date]

生成請求的輸出:

Date             Clicks    3-Day Moving Average
2012-05-01       2,230
2012-05-02       3,150
2012-05-03       5,520          4,360
2012-05-04       1,330          3,330
2012-05-05       2,260          3,120
2012-05-06       3,540          3,320
2012-05-07       2,330          3,010

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM