简体   繁体   English

使用R或mysql计算时间段收益?

[英]Use R or mysql to calculate time period returns?

I'm trying to calculate various time period returns (monthly, quarterly, yearly etc.) for each unique member (identified by Code in the example below) of a data set. 我正在尝试为数据集的每个唯一成员(在下面的示例中由Code标识)计算各种时间段的回报(每月,每季度,每年等)。 The data set will contain monthly pricing information for a 20 year period for approximately 500 stocks. 数据集将包含20年期间约500只股票的每月定价信息。 An example of the data is below: 数据示例如下:

         Date Code    Price Dividend
1  2005-01-31  xyz  1000.00     20.0
2  2005-01-31  abc     1.00      0.1
3  2005-02-28  xyz  1030.00     20.0
4  2005-02-28  abc     1.01      0.1
5  2005-03-31  xyz  1071.20     20.0
6  2005-03-31  abc     1.03      0.1
7  2005-04-30  xyz  1124.76     20.0

I am fairly new to R, but thought that there would be a more efficient solution than looping through each Code and then each Date as shown here: 我对R相当陌生,但认为比循环遍历每个Code然后遍历每个Date会更有效的解决方案,如下所示:

uniqueDates <- unique(data$Date)
uniqueCodes <- unique(data$Code

for  (date in uniqueDates) {
  for (code in uniqueCodes) {
    nextDate <- seq.Date(from=stock_data$Date[i], by="3 months",length.out=2)[2]
    curPrice <- data$Price[data$Date == date]
    futPrice <- data$Price[data$Date == nextDate]
    data$ret[(data$Date == date) & (data$Code == code)] <- (futPrice/curPrice)-1
  }
}

This method in itself has an issue in that seq.Date does not always return the final day in the month. 此方法本身存在一个问题,即seq.Date并不总是返回该月的最后一天。

Unfortunately the data is not uniform (the number of companies/codes varies over time) so using a simple row offset won't work. 不幸的是,数据并不统一(公司/代码的数量随时间变化),因此使用简单的行偏移量将行不通。 The calculation must match the Code and Date with the desired date offset. 计算结果必须与“ Code和“ Date与所需的日期偏移量匹配。

I had initially tried selecting the future dates by using the seq.Date function 我最初尝试使用seq.Date函数选择将来的日期

data$ret = (data[(data$Date == (seq.Date(from = data$Date, by="3 month", length.out=2)[2])), "Price"] / data$Price) - 1

But this generated an error as seq.Date requires a single entry. 但这产生了一个错误,因为seq.Date需要一个条目。

> Error in seq.Date(from = stock_data$Date, by = "3 month", length.out =
> 2) :    'from' must be of length 1

I thought that R would be well suited to this type of calculation but perhaps not. 我认为R非常适合这种类型的计算,但也许不适合。 Since all the data is in a mysql database I am now thinking that it might be faster/easier to do this calc directly in the database. 由于所有数据都在mysql数据库中,因此我现在想直接在数据库中进行此计算可能会更快/更轻松。

Any suggestions would be greatly appreciated. 任何建议将不胜感激。

You can do this very easily with the quantmod and xts packages. 您可以使用quantmod和xts软件包很容易地做到这一点。 Using the data in AndresT's answer: 使用AndresT的答案中的数据:

library(quantmod)  # loads xts too
pp1 <- reshape(df,timevar='Code',idvar='Date',direction='wide')
# create an xts object
x <- xts(pp1[,-1], pp1[,1])
# only get the "Price.*" columns
p <- getPrice(x)
# run the periodReturn function on each column
r <- apply(p, 2, periodReturn, period="monthly", type="log")
# merge prior result into a multi-column object
r <- do.call(merge, r)
# rename columns
names(r) <- paste("monthly.return",
  sapply(strsplit(names(p),"\\."), "[", 2), sep=".")

Which leaves you with an r xts object containing: 这给您一个r xts对象,其中包含:

           monthly.return.xyz monthly.return.abc
2005-01-31         0.00000000        0.000000000
2005-02-28         0.02955880        0.009950331
2005-03-31         0.03922071        0.019608471
2005-04-30         0.04879016                 NA

Load data: 加载数据:

tc='
  Date Code    Price Dividend
  2005-01-31  xyz  1000.00     20.0
  2005-01-31  abc     1.00      0.1
  2005-02-28  xyz  1030.00     20.0
  2005-02-28  abc     1.01      0.1
  2005-03-31  xyz  1071.20     20.0
  2005-03-31  abc     1.03      0.1
  2005-04-30  xyz  1124.76     20.0'

df = read.table(text=tc,header=T)
df$Date=as.Date(df$Date,"%Y-%m-%d")

First I would organize the data by date: 首先,我将按日期组织数据:

library(plyr)
pp1=reshape(df,timevar='Code',idvar='Date',direction='wide')

Then you would like to obtain monthly, quarterly, yearly, etc returns. 然后,您希望获得每月,每季度,每年等回报。 For that there are several options, one could be: 为此,有几种选择,其中一种可能是:

Make the data zoo or xts class. 使数据动物园或xts类。 ie

library(xts)
pp1[2:ncol(pp1)]  = as.xts(pp1[2:ncol(pp1)],order.by=pp1$Date)


#let's create a function for calculating returns.
rets<-function(x,lag=1){
  return(diff(log(x),lag))
}

Since this database is monthly, the lags for the returns will be: monthly=1, quaterly=3, yearly =12. 由于该数据库是每月数据库,因此回报的滞后时间为:每月= 1,每季度= 3,每年= 12。 for instance let's calculate monthly return for xyz. 例如,让我们计算xyz的每月收益。

lagged=1 #for monthly

This calculates Monthly returns for xyz 计算xyz的每月收益

pp1$returns_xyz= c(NA,rets(pp1$Price.xyz,lagged))

To get all the returns: 要获得所有回报:

#create matrix of returns

pricelist= ls(pp1)[grep('Price',ls(pp1))]

returnsmatrix = data.frame(matrix(rep(0,(nrow(pp1)-1)*length(pricelist)),ncol=length(pricelist)))

j=1
for(i in pricelist){
    n = which(names(pp1) == i)
    returnsmatrix[,j] =  rets(pp1[,n],1)
    j=j+1
}


#column names

codename= gsub("Price.", "", pricelist, fixed = TRUE)


names(returnsmatrix)=paste('ret',codename,sep='.')


returnsmatrix

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM