简体   繁体   中英

sas - calculate moving average for grouped data with BY statement

I'm a SAS beginner and I'm curious if the following task can be done much more simple as it is currently in my head.

I have the following (simplified) meta data in a table named user_date_money:

User - Date - Money

with various users and dates for every calendar day (for the last 4 years). The data is ordered by User ASC and Date ASC, sample data looks like this:

User  | Date     | Money
Anna   23.10.2013   5
Anna   24.10.2013   1
Anna   25.10.2013   12
      ....       
Aron   23.10.2013   5
Aron   24.10.2013   12
Aron   25.10.2013   4 
     ....
Zoe    23.10.2013   1
Zoe    24.10.2013   1
Zoe    25.10.2013   0

I now want to calculate a five day moving average for the Money. I started with the pretty popular apprach with the lag() function like this:

data cma; 
set user_date_money;
if missing(money) then
do;
OBS = 0;
money = 0.0;
end;
else OBS = 1;
money5 = lag5(money);
OBS5= lag5(obs);
if missing(money5) then money5= 0.0;
if missing(obs5) then obs5= 0;

if _N_ = 1 then
do;
SUM = 0.0;
N = 0;
end;
else;
sum = sum + money-money5;
n = n + obs-obs5;
MEAN = sum / n ;
retain sum n;
run;

as you see, the problem with this method occurs if there if the data step runs into a new user. Aron would get some lagged values from Anna which of course should not happen.

Now my question: I am pretty sure you can handle the user switch by adding some extra fields like laggeduser and by resetting the N, Sum and Mean variables if you notice such a switch but:

Can this be done in an easier way? Perhaps using the BY Clause in any way? Thanks for your ideas and help!

Best regards

I think the easiest way is to use PROC EXPAND:

PROC EXPAND data=user_date_money out=cma;
  ID date;
  BY user;
  CONVERT money=MEAN / transformin=(setmiss 0) transformout=(movave 5);
RUN;

And as mentioned in John's comment, it's important to remember about missing values (and about beginning and ending observations as well). I've added SETMISS option to the code, as you made it clear that you want to 'zerofy' missing values, not ignore them (default MOVAVE behaviour). And if you want to exclude first 4 observations for each user (since they don't have enough pre-history to calculate moving average 5), you can use option 'TRIMLEFT 4' inside TRANSFORMOUT=().

If you make sure your data is sorted, you can use the first and last named variables to initialize your running totals when you get to a new member. These and retain should get you what you need; I don't think lag() is really called for here.

If your particular need is simple enough, you can calculate it using PROC MEANS and a multilabel format.

data mydata;
do id = 1 to 5;
  datevar = '01JAN2010'd-1;
  do month = 0 to 4;
    datevar=intnx('MONTH',datevar,1,'b');
    sales = floor(500*rand('normal',7))+1500;
    output;
  end;
end;
run;

proc format;
value movingavg (multilabel notsorted)
'01JAN2010'd-'31MAR2010'd = 'JAN-MAR 2010'
'01FEB2010'd-'30APR2010'd = 'FEB-APR 2010'
'01MAR2010'd-'31MAY2010'd = 'MAR-MAY 2010'
/* ... more of these ... */
;
quit;

proc means data=mydata;
class id datevar/mlf order=data;
types id*datevar;
format datevar movingavg.;
var sales;
run;

The PROC FORMAT can be done programatically by use of the CNTLIN dataset, see SAS documentation for PROC FORMAT for more information.

Yes, you can use by groupings. First, you'll sort by user and date (as you already have).

proc sort data=user_date_money;
    by user date;
run;

Then, redo the data step using the by variable and a counter.

data cma;
    set user_date_money;
    by user;

    length User_Recs 3
            Average 8;

    retain User_Recs;

    if First.User=1 then User_Recs=0;

    User_Recs=User_Recs+1;

    if User_Recs>4 then do;
        Average=(lag4(money)+lag3(money)+lag2(money)+lag1(money)+money)/5;
    end;

    drop User_Recs;
run;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM