简体   繁体   English

如何计算data.table中的收益?

[英]How to calculate return in data.table?

I am new on stack overflow and an R beginner. 我是堆栈溢出和R初学者的新手。

I want to calculate the returns of a big data set which looks like this: 我想计算一个大数据集的回报,如下所示:

Date        C1  C2  C3
31.01.1985  NA  47  NA
28.02.1985  NA  45  NA
29.03.1985  130 56  NA
30.04.1985  140 67  NA
31.05.1985  150 48  93
28.06.1985  160 79  96
31.07.1985  160 56  94
30.08.1985  160 77  93
30.09.1985  160 66  93
31.10.1985  160 44  93
29.11.1985  160 55  93

It's a data.table format, let's say it's called Prices, columns are the companies, values are the prices, the real data set has many more columns and rows. 这是一种data.table格式,即所谓的价格,列是公司,值是价格,实际数据集具有更多的列和行。 I want to build a new DT where I calculate the monthly returns, I know you can do this with the diff() function. 我想建立一个新的DT来计算月收益,我知道您可以使用diff()函数来完成。 but how do I build my new Data table with so many columns without for loops? 但是,如何建立具有如此多列而又没有for循环的新数据表?

I thought of: 我想到了:

Returns <- diff(Prices[, names(Prices) != "Date"])

but this for some reason only gives out: 但这出于某些原因只会给出:

[1] 1 0 0

Thanks in advance. 提前致谢。

The reason you are getting that output is because Prices[, names(Prices) != "Date"] returns a logical vector: 得到该输出的原因是因为Prices[, names(Prices) != "Date"]返回一个逻辑向量:

> Prices[, names(Prices) != "Date"]
[1] FALSE  TRUE  TRUE  TRUE

And because you can do calculations with logicals, you can also use diff on a logical vector. 并且由于可以使用逻辑进行计算,因此还可以在逻辑向量上使用diff FALSE is then treated as a 0 and TRUE as a 1 . 然后将FALSE视为0 ,将TRUE视为1 So basically you were doing diff(c(0,1,1,1)) . 所以基本上你在做diff(c(0,1,1,1))


A possible solution for what you want: 可能需要的解决方案:

cols <- setdiff(names(Prices),"Date")

# option 1:
Prices[, paste0(cols,"_return") := lapply(.SD, function(x) (x - shift(x, fill = NA))/shift(x, fill = NA)), .SDcols = cols][]

# option 2:
Prices[, paste0(cols,"_return") := lapply(.SD, function(x) c(NA,diff(x))/shift(x, fill = NA)), .SDcols = cols][]

which gives: 这使:

 > Prices Date C1 C2 C3 C1_return C2_return C3_return 1: 1985-01-31 NA 47 NA NA NA NA 2: 1985-02-28 NA 45 NA NA -0.04255319 NA 3: 1985-03-29 130 56 NA NA 0.24444444 NA 4: 1985-04-30 140 67 NA 0.07692308 0.19642857 NA 5: 1985-05-31 150 48 93 0.07142857 -0.28358209 NA 6: 1985-06-28 160 79 96 0.06666667 0.64583333 0.03225806 7: 1985-07-31 160 56 94 0.00000000 -0.29113924 -0.02083333 8: 1985-08-30 160 77 93 0.00000000 0.37500000 -0.01063830 9: 1985-09-30 160 66 93 0.00000000 -0.14285714 0.00000000 10: 1985-10-31 160 44 93 0.00000000 -0.33333333 0.00000000 11: 1985-11-29 160 55 93 0.00000000 0.25000000 0.00000000 

If you want to create a new data.table , you could use one of the following two options: 如果要创建新的data.table ,则可以使用以下两个选项之一:

# option 1:
Returns <- Prices[, c(list(Date = Date), lapply(.SD, function(x) (x - shift(x, fill = NA))/shift(x, fill = NA))), .SDcols = cols]

# option 2:
Returns <- copy(Prices)
Returns[, (cols) := lapply(.SD, function(x) (x - shift(x, fill = NA))/shift(x, fill = NA)), .SDcols = cols]

Used data: 使用的数据:

Prices <- fread("Date        C1  C2  C3
31.01.1985  NA  47  NA
28.02.1985  NA  45  NA
29.03.1985  130 56  NA
30.04.1985  140 67  NA
31.05.1985  150 48  93
28.06.1985  160 79  96
31.07.1985  160 56  94
30.08.1985  160 77  93
30.09.1985  160 66  93
31.10.1985  160 44  93
29.11.1985  160 55  93")[, Date := as.Date(Date, "%d.%m.%Y")]

I would write a function to work on a single column of values 我会写一个函数来处理单列值

pc.change <- function(x) {   
(c(x[2:length(x)], NA) - x)*100/x }

And then apply this to a matrix of all the columns of values 然后将其应用于所有值列的矩阵

d <- read.table(text = "Date        C1  C2  C3
31.01.1985  NA  47  NA
28.02.1985  NA  45  NA
29.03.1985  130 56  NA
30.04.1985  140 67  NA
31.05.1985  150 48  93
28.06.1985  160 79  96
31.07.1985  160 56  94
30.08.1985  160 77  93
30.09.1985  160 66  93
31.10.1985  160 44  93
29.11.1985  160 55  93", header = TRUE)

apply(as.matrix(d[,2:4]), 2, pc.change)

This gives me 这给我

            C1         C2        C3
[1,]       NA  -4.255319        NA
[2,]       NA  24.444444        NA
[3,] 7.692308  19.642857        NA
[4,] 7.142857 -28.358209        NA
[5,] 6.666667  64.583333  3.225806
[6,] 0.000000 -29.113924 -2.083333
[7,] 0.000000  37.500000 -1.063830
[8,] 0.000000 -14.285714  0.000000
[9,] 0.000000 -33.333333  0.000000
[10,] 0.000000  25.000000  0.000000
[11,]       NA         NA        NA

It should then be possible to convert this into a data table if needed 然后,如有必要,应该可以将其转换为数据表

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM