简体   繁体   中英

Use R or mysql to calculate time period returns?

I'm trying to calculate various time period returns (monthly, quarterly, yearly etc.) for each unique member (identified by Code in the example below) of a data set. The data set will contain monthly pricing information for a 20 year period for approximately 500 stocks. An example of the data is below:

         Date Code    Price Dividend
1  2005-01-31  xyz  1000.00     20.0
2  2005-01-31  abc     1.00      0.1
3  2005-02-28  xyz  1030.00     20.0
4  2005-02-28  abc     1.01      0.1
5  2005-03-31  xyz  1071.20     20.0
6  2005-03-31  abc     1.03      0.1
7  2005-04-30  xyz  1124.76     20.0

I am fairly new to R, but thought that there would be a more efficient solution than looping through each Code and then each Date as shown here:

uniqueDates <- unique(data$Date)
uniqueCodes <- unique(data$Code

for  (date in uniqueDates) {
  for (code in uniqueCodes) {
    nextDate <- seq.Date(from=stock_data$Date[i], by="3 months",length.out=2)[2]
    curPrice <- data$Price[data$Date == date]
    futPrice <- data$Price[data$Date == nextDate]
    data$ret[(data$Date == date) & (data$Code == code)] <- (futPrice/curPrice)-1
  }
}

This method in itself has an issue in that seq.Date does not always return the final day in the month.

Unfortunately the data is not uniform (the number of companies/codes varies over time) so using a simple row offset won't work. The calculation must match the Code and Date with the desired date offset.

I had initially tried selecting the future dates by using the seq.Date function

data$ret = (data[(data$Date == (seq.Date(from = data$Date, by="3 month", length.out=2)[2])), "Price"] / data$Price) - 1

But this generated an error as seq.Date requires a single entry.

> Error in seq.Date(from = stock_data$Date, by = "3 month", length.out =
> 2) :    'from' must be of length 1

I thought that R would be well suited to this type of calculation but perhaps not. Since all the data is in a mysql database I am now thinking that it might be faster/easier to do this calc directly in the database.

Any suggestions would be greatly appreciated.

You can do this very easily with the quantmod and xts packages. Using the data in AndresT's answer:

library(quantmod)  # loads xts too
pp1 <- reshape(df,timevar='Code',idvar='Date',direction='wide')
# create an xts object
x <- xts(pp1[,-1], pp1[,1])
# only get the "Price.*" columns
p <- getPrice(x)
# run the periodReturn function on each column
r <- apply(p, 2, periodReturn, period="monthly", type="log")
# merge prior result into a multi-column object
r <- do.call(merge, r)
# rename columns
names(r) <- paste("monthly.return",
  sapply(strsplit(names(p),"\\."), "[", 2), sep=".")

Which leaves you with an r xts object containing:

           monthly.return.xyz monthly.return.abc
2005-01-31         0.00000000        0.000000000
2005-02-28         0.02955880        0.009950331
2005-03-31         0.03922071        0.019608471
2005-04-30         0.04879016                 NA

Load data:

tc='
  Date Code    Price Dividend
  2005-01-31  xyz  1000.00     20.0
  2005-01-31  abc     1.00      0.1
  2005-02-28  xyz  1030.00     20.0
  2005-02-28  abc     1.01      0.1
  2005-03-31  xyz  1071.20     20.0
  2005-03-31  abc     1.03      0.1
  2005-04-30  xyz  1124.76     20.0'

df = read.table(text=tc,header=T)
df$Date=as.Date(df$Date,"%Y-%m-%d")

First I would organize the data by date:

library(plyr)
pp1=reshape(df,timevar='Code',idvar='Date',direction='wide')

Then you would like to obtain monthly, quarterly, yearly, etc returns. For that there are several options, one could be:

Make the data zoo or xts class. ie

library(xts)
pp1[2:ncol(pp1)]  = as.xts(pp1[2:ncol(pp1)],order.by=pp1$Date)


#let's create a function for calculating returns.
rets<-function(x,lag=1){
  return(diff(log(x),lag))
}

Since this database is monthly, the lags for the returns will be: monthly=1, quaterly=3, yearly =12. for instance let's calculate monthly return for xyz.

lagged=1 #for monthly

This calculates Monthly returns for xyz

pp1$returns_xyz= c(NA,rets(pp1$Price.xyz,lagged))

To get all the returns:

#create matrix of returns

pricelist= ls(pp1)[grep('Price',ls(pp1))]

returnsmatrix = data.frame(matrix(rep(0,(nrow(pp1)-1)*length(pricelist)),ncol=length(pricelist)))

j=1
for(i in pricelist){
    n = which(names(pp1) == i)
    returnsmatrix[,j] =  rets(pp1[,n],1)
    j=j+1
}


#column names

codename= gsub("Price.", "", pricelist, fixed = TRUE)


names(returnsmatrix)=paste('ret',codename,sep='.')


returnsmatrix

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM