简体   繁体   中英

Calculating rolling correlations in SAS

I have a data set here .

An excerpt of my data looks like this: (For an enlarged version: http://puu.sh/79NCK.jpg )

在此输入图像描述

(Note: there are no missing values in my dataset)

I wish to calculate the correlation matrix using a rolling window of 1 year. My period starts from 01 Jan 2008. So for example, the correlation between AUT and BEL on 01 Jan 2008 is calculated using the series of values from 01 Jan 2007 to 01 Jan 2008, and likewise for all other pairs. Similarly the correlation between AUT and BEL on 02 Jan 2008 is calculated using the series of values from 02 Jan 2007 to 02 Jan 2008.

Since there will be a different correlation matrix for each day, I want to output each day's correlation matrix into a sheet in excel and name that sheet COV1 (for 01 Jan 2008), COV2 (for 02 Jan 2008), COV3 (for 03 Jan 2008), and so on until COV1566 (for 31 Dec 2013). An excerpt of the output for each sheet is like this: (Note: with the titles included on the top row and first column)

http://puu.sh/79NAy.jpg

I have loaded my datafile into SAS named rolling . For the moment, my code is simply:

proc corr data = mm.rolling;
run;

Which simply calculates the correlation matrix using the entire series of values. I am very new to SAS, any help would be appreciated.

Think about how you might do if you had immense amount of patience.

proc corr data = mm.rolling out = correlation_as_of_01jan2008;
where date between '01jan2007'd and '01jan2008'd;
run;

Similarly,

proc corr data = mm.rolling out = correlation_as_of_02jan2008;
where date between '02jan2007'd and '02jan2008'd;
run;

Thankfully you can use SAS macro programming to achieve a similar effect as shown in this macro:

%macro rollingCorrelations(inputDataset=, refDate=);
/*first get a list of unique dates on or after the reference date*/
proc freq data = &inputDataset. noprint;
where date >="&refDate."d;
table date/out = dates(keep = date);
run;


/*for each date calculate what the window range is, here using a year's length*/
data dateRanges(drop = date);
set dates end = endOfFile 
                nobs= numDates;
format toDate fromDate date9.;

toDate=date;
fromDate = intnx('year', toDate, -1, 's');

call symputx(compress("toDate"!!_n_), put(toDate,date9.));
call symputx(compress("fromDate"!!_n_), put(fromDate, date9.) );

/*find how many times(numberOfWindows) we need to iterate through*/
if endOfFile then do;
call symputx("numberOfWindows", numDates);
end;

run;
%do i = 1 %to &numberOfWindows.;
/*create a temporary view which has the filtered data that is passed to PROC CORR*/
data windowedDataview / view = windowedDataview;
set  &inputDataset.;
where date between "&&fromDate&i."d and "&&toDate&i."d;
drop date;
run;
    /*the output dataset from each PROC CORR run will be 
correlation_DDMMMYYY<from date>_DDMMMYY<start date>*/
proc corr data = windowedDataview 
outp = correlations_&&fromDate&i.._&&toDate&i. (where=(_type_ = 'CORR'))

        noprint;
run;

%end;

/*append all datasets into a single table*/
data all_correlations;
format from to date9.;
set correlations_:
     indsname = datasetname
;
from = input(substr(datasetname,19,9),date9.);
to = input(substr(datasetname,29,9), date9.);
run;


%mend rollingCorrelations;
%rollingCorrelations(inputDataset=rolling, refDate=01JAN2008)

The final output from the above macro will have from & to identifier to identify which date range each correlation matrix refers. Run it and examine the results.

I dont think excel can accomodate over 1500 tabs anyway, so best to keep it in a single table. The final table had 81K rows and the whole process ran in 2.5 mins.

update: to sort them by from & to

proc sort data = ALL_CORRELATIONS;
by from to;
run;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM