简体   繁体   中英

Proc expand for irregular time series

How can I calculate the moving average of a time series that has breaks using PROC EXPAND? I am looking for a more efficient way to do this calculation because a DATA steps and JOINS on large datasets take a very long time to execute due to server constrains.

Data:

data have(drop=i);
call streaminit(1);
do i = 1 to 20;
    period = i;
    if i > 10 then period = i + 5;
    if i >17 then period = i + 6;
    x = round(rand('uniform')*10,1.);
    output;
end;
run;

│ period │ x  │
├────────┼────┤
│ 1      │ 9  │
│ 2      │ 10 │
│ 3      │ 5  │
│ 4      │ 9  │
│ 5      │ 7  │
│ 11     │ 9  │
│ 12     │ 9  │
│ 13     │ 5  │
│ 15     │ 8  │
│ 16     │ 9  │

Notice that there are two break points in the period variable: 5-11 and 13-15.
Here is the expected result (3 month moving average):

Proc sql;
create table want as
select a.period, a.x
      ,mean(b.x) as x_avg format=10.2
from have as a
left join have as b
    on a.period -3 < b.period <= a.period
group by 1,2;
Quit;

│ period │ x  │ x_avg │
├────────┼────┼───────┤
│ 1      │ 9  │ 9.00  │
│ 2      │ 10 │ 9.50  │
│ 3      │ 5  │ 8.00  │
│ 4      │ 9  │ 8.00  │
│ 5      │ 7  │ 7.00  │
│ 11     │ 9  │ 9.00  │
│ 12     │ 9  │ 9.00  │
│ 13     │ 5  │ 7.67  │
│ 15     │ 8  │ 6.50  │
│ 16     │ 9  │ 8.50  │

You can make SQL faster by a little modification.

proc sql noprint;
  create table want2 as
  select a.period, a.x ,mean(b1.x,b2.x,a.x) as x_avg format=10.2
  from have as a
  left join have as b1 on a.period -2 = b1.period
  left join have as b2 on a.period -1 = b2.period
  order by a.period;
quit;

Use proc timeseries to add missing values between each gap, then run it through proc expand with method=none . We'll consider the interval daily since it's incrementing by one value at a time. Filter out your final dataset to have no missing values of x .

proc timeseries data = have
                out  = have_ts;
    id period interval=day setmissing=missing;
    var x;
run;

proc expand data = have_ts
            out  = want(where=(NOT missing(x)) );
    id period;
    convert x = x_avg / method=none transform=(moveave 3);
run;

You'll need to reformat period to 8. with proc datasets since proc timeseries needs to treat it as a date.

proc datasets lib=work nolist;
    modify want;
        format x 8.;
quit;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM