I am very new to R (I used Stata before). I am currently re-do the some financial theory test which involves:
declare time series
calculate relevant variable like daily returns
rank stock performance (cross-sectional)
form portfolio
run regressions.
My questions is :
What I mean by wide dataset is having 900 columns of price for 900 stocks. Long data means 1 price column and 900 stocks in rows. This data includes daily data during last 10 years. So this is a massive data processing. That is why any experience you share to me is precious.
Wide data example:
dateyyyymmdd price.AAA price.BBB
1 2015-10-02 10.1 10.7
2 2015-10-01 10.3 10.4
3 2015-09-30 10.4 10.4
4 2015-09-29 10.6 10.6
5 2015-09-28 10.7 11.0
6 2015-09-25 10.4 10.8
7 2015-09-24 9.8 10.2
8 2015-09-23 9.9 10.1
9 2015-09-22 9.9 9.9
10 2015-09-21 10.1 10.1
Long data example:
dateyyyymmdd id price
1 2015-10-02 AAA 10.7
2 2015-10-01 AAA 10.4
3 2015-09-30 AAA 10.4
4 2015-09-29 AAA 10.6
5 2015-09-28 AAA 11.0
6 2015-10-02 BBB 10.8
7 2015-10-01 BBB 10.2
8 2015-09-30 BBB 10.1
9 2015-09-29 BBB 9.9
10 2015-09-38 BBB 10.1
Here are my obstacles during my work:
declare time series and calculate daily return: I found it much harder to declare time series and use it compared to Stata. I tried ts()
and some others but eventually I did not know how to calculate daily return for each stock in a "smart way". I tried diff()
but it needs to apply the right order of dates.
ranking stock returns. I did not reach this part yet. However, if someone can help me if I should use wide or long data for time-saving purpose. I will have to rank returns accross stocks in a day then group them and calculate parameters for each group.
Running regression and portfolio analytics. I had a look at portfolio analytic packages and guessed that it used wide data since the given examples showed many tickers as names (1 price column for each stock) of the dataframe.
To address your issues:
xts
package is frequently used for financial data which form an irregular time series due to missing weekends and holidays. The xts
package includes a version of diff
which uses proper date ordering to calculate returns. The code below uses the xts
package with diff
on your example data to properly calculate returns. rank
function is used to rank returns for each day. Since rank
only works on a single vector of data, the apply
function is used to select each row, rank the returns, and then assemble the rankings into a matrix. The matrix of results will need to be restored to an xts
time series which is done with Reclass
. Finally, for the purposes of the example, it might be helpful to combine data and results into one time series which is done with merge
. PerformanceAnalytics
package. xts
time series with each asset in its own column work well with the PerformanceAnalytics
. As an example, the code uses the calculated returns and assumes an equi-weight portfolio to calculate a time series of portfolio returns using Return.portfolio
function from the PerformanceAnalytics
package. You mention that you have 10 years of daily data for 900 assets. This number of rows is not at all large for R but the number of columns probably is. I'd try using something like the code below and see whether there are any performance issues. If so, there a couple of options you could try.
library(xts)
# transform to an xts time series
dfx <- xts(df[,-1], order.by=as.Date(df[,1]))
# calculate returns; no return (NA) is calculated for first date so remove
df_ret <- diff(dfx, arithmetic=FALSE, na.pad=FALSE)-1
# label columns containing returns
colnames(df_ret) <- sub("price", "return",colnames(df_ret))
# calculate ranks for each row of returns, add ranks as columns to data, and restore as xts time series
df_rank <- Reclass(t(apply(df_ret, 1, rank)))
# lable columns containing ranks
colnames(df_rank) <- sub("return","rank",colnames(df_rank))
# returns and ranks can be combined with prices if desired
dfx <- merge(dfx, df_ret, df_rank)
# Calculate returns for a portfolio equi-weighted at beginning of period and not rebalance
library(PerformanceAnalytics)
port_ret <- Return.portfolio(R=df_ret, weights=c(.5,.5))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.