简体   繁体   中英

Using R: Fama MacBeth Regression - Portfolio formation and Stock return ranking

I am very new to R (I used Stata before). I am currently re-do the some financial theory test which involves:

  1. declare time series

  2. calculate relevant variable like daily returns

  3. rank stock performance (cross-sectional)

  4. form portfolio

  5. run regressions.

My questions is :

  1. Should I use wide dataset or long dataset for more convenient script writing and better time-saving?
  2. Which are corresponding commands I should use?

What I mean by wide dataset is having 900 columns of price for 900 stocks. Long data means 1 price column and 900 stocks in rows. This data includes daily data during last 10 years. So this is a massive data processing. That is why any experience you share to me is precious.

Wide data example:

dateyyyymmdd          price.AAA       price.BBB
1    2015-10-02           10.1           10.7
2    2015-10-01           10.3           10.4
3    2015-09-30           10.4           10.4
4    2015-09-29           10.6           10.6
5    2015-09-28           10.7           11.0
6    2015-09-25           10.4           10.8
7    2015-09-24            9.8           10.2
8    2015-09-23            9.9           10.1
9    2015-09-22            9.9            9.9
10   2015-09-21           10.1           10.1

Long data example:

dateyyyymmdd             id                price
1    2015-10-02           AAA           10.7
2    2015-10-01           AAA           10.4
3    2015-09-30           AAA           10.4
4    2015-09-29           AAA           10.6
5    2015-09-28           AAA           11.0
6    2015-10-02           BBB           10.8
7    2015-10-01           BBB           10.2
8    2015-09-30           BBB           10.1
9    2015-09-29           BBB            9.9
10   2015-09-38           BBB           10.1

Here are my obstacles during my work:

  1. declare time series and calculate daily return: I found it much harder to declare time series and use it compared to Stata. I tried ts() and some others but eventually I did not know how to calculate daily return for each stock in a "smart way". I tried diff() but it needs to apply the right order of dates.

  2. ranking stock returns. I did not reach this part yet. However, if someone can help me if I should use wide or long data for time-saving purpose. I will have to rank returns accross stocks in a day then group them and calculate parameters for each group.

  3. Running regression and portfolio analytics. I had a look at portfolio analytic packages and guessed that it used wide data since the given examples showed many tickers as names (1 price column for each stock) of the dataframe.

To address your issues:

  1. R has a number of methods for representing time series. The xts package is frequently used for financial data which form an irregular time series due to missing weekends and holidays. The xts package includes a version of diff which uses proper date ordering to calculate returns. The code below uses the xts package with diff on your example data to properly calculate returns.
  2. The rank function is used to rank returns for each day. Since rank only works on a single vector of data, the apply function is used to select each row, rank the returns, and then assemble the rankings into a matrix. The matrix of results will need to be restored to an xts time series which is done with Reclass . Finally, for the purposes of the example, it might be helpful to combine data and results into one time series which is done with merge .
  3. You indicate that you're interested in using the PerformanceAnalytics package. xts time series with each asset in its own column work well with the PerformanceAnalytics . As an example, the code uses the calculated returns and assumes an equi-weight portfolio to calculate a time series of portfolio returns using Return.portfolio function from the PerformanceAnalytics package.

You mention that you have 10 years of daily data for 900 assets. This number of rows is not at all large for R but the number of columns probably is. I'd try using something like the code below and see whether there are any performance issues. If so, there a couple of options you could try.

library(xts)
#   transform to an xts time series
dfx <- xts(df[,-1], order.by=as.Date(df[,1]))
#   calculate returns; no return (NA) is calculated for first date so remove
df_ret <- diff(dfx, arithmetic=FALSE, na.pad=FALSE)-1
#   label columns containing returns
colnames(df_ret) <- sub("price", "return",colnames(df_ret))
#   calculate ranks for each row of returns, add ranks as columns to data, and restore as xts time series
df_rank <- Reclass(t(apply(df_ret, 1, rank)))
#   lable columns containing ranks
colnames(df_rank) <- sub("return","rank",colnames(df_rank))
# returns and ranks can be combined with prices if desired
dfx <- merge(dfx, df_ret, df_rank)

#  Calculate returns for a portfolio equi-weighted at beginning of period and not rebalance
library(PerformanceAnalytics)
port_ret <- Return.portfolio(R=df_ret, weights=c(.5,.5))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM