简体   繁体   中英

Calculating net present value of a series of columns in R data.table

I've got a data.table with 40 columns representing income in 40 consecutive periods. I'm trying to add a variable representing the NPV of the income stream for each observation (ie $\\sum_{t=1}^T \\beta^{t-1} y_{i,t}$, the discounted sum of incomes).

My approach is:

dt[,NPV:=rowSums(.SD*.95^(0:39)),.SDcols=paste0("year_",1:40)]

But this is giving strange results. In fact, .SD*.96^(0:39) is itself doing something that I don't understand--I guess the problem is it doesn't know how to multiply .SD with the vector .95^(0:39) . Must be recycling...

Given this, I tried some sort of lapply to handle the product, but that wasn't working; next, specifying the problem as a matrix multiplication .SD %*% .95^(0:39) also does not work.

Any ideas about what to do? Maybe reshape and go from there...

For concreteness, here's an example you can play with on 5 periods.

set.seed(3654654)
dt<-data.table(id=1:10,year_1=rchisq(10,df=1),
              year_2=rchisq(10,df=1),
              year_3=rchisq(10,df=1),
              year_4=rchisq(10,df=1),
              year_5=rchisq(10,df=1))

> dt
    id     year_1     year_2       year_3      year_4      year_5
 1:  1 0.27161866 0.12764396 0.2775017833 5.210941183 0.027654609
 2:  2 2.44271387 1.21104397 0.1242118874 0.009518939 3.265443502
 3:  3 0.18095011 0.06581832 1.1619364400 0.938078133 2.238590035
 4:  4 0.02148331 3.38477084 0.1254167045 0.041640559 0.212538797
 5:  5 1.27821958 0.19046799 3.1166384038 0.586280661 0.019470595
 6:  6 0.03413820 0.68214806 0.9325970029 0.568719470 0.061664982
 7:  7 2.32055628 0.04137301 0.1810722845 0.050654213 1.377958712
 8:  8 0.95498438 0.03095528 0.7081911061 3.127335761 2.293907090
 9:  9 4.49044959 1.75553222 0.0005865227 0.207076713 0.577015216
10: 10 0.02984232 0.02522646 0.3891819870 0.178056404 0.006526457

So the Net Present Value should be:

           [,1]
 [1,] 5.1335813
 [2,] 6.3731923
 [3,] 3.9197555
 [4,] 3.5590199
 [5,] 4.7904516
 [6,] 2.0616800
 [7,] 3.6890640
 [8,] 6.1732355
 [9,] 6.8062594
[10,] 0.5630211

Here's what I've tried so far gives me:

> dt[,rowSums(.SD*.95^(0:4)),.SDcols=paste0("year_",1:5)]
 [1] 5.9153602 6.7002856 4.1382992 3.2458933 4.2281649
     2.2792677 3.7730338 6.4216247 6.0279123 0.5121889

(totally incorrect--why? for the same reason this doesn't work:

> dt[,.SD*.95^(0:4),.SDcols=paste0("year_",1:5)]
        year_1     year_2       year_3      year_4     year_5
 1: 0.27161866 0.12764396 0.2775017833 5.210941183 0.02765461
 2: 2.32057818 1.15049177 0.1180012931 0.009042992 3.10217133
 3: 0.16330748 0.05940104 1.0486476371 0.846615515 2.02032751
 4: 0.01841926 2.90201790 0.1075291471 0.035701574 0.18222545
 5: 1.04111784 0.15513737 2.5385214589 0.477529263 0.01585892
 6: 0.03413820 0.68214806 0.9325970029 0.568719470 0.06166498
 7: 2.20452847 0.03930436 0.1720186702 0.048121502 1.30906078
 8: 0.86187340 0.02793714 0.6391424733 2.822420524 2.07025115
 9: 3.84999922 1.50514943 0.0005028699 0.177542396 0.49471842
10: 0.02430675 0.02054711 0.3169911608 0.145028054 0.00531584

--seems to be multiplying down the rows instead of across the columns)

> dt[,.SD %*% .95^(0:4),.SDcols=paste0("year_",1:5)]
Error in .SD %*% 0.95^(0:4) : 
  requires numeric/complex matrix/vector arguments

Try this:

> dt[, as.matrix(.SD) %*% 0.95 ^ (0:4), .SDcols = -1]
           [,1]
 [1,] 5.1335813
 [2,] 6.3731923
 [3,] 3.9197555
 [4,] 3.5590199
 [5,] 4.7904516
 [6,] 2.0616800
 [7,] 3.6890640
 [8,] 6.1732355
 [9,] 6.8062594
[10,] 0.5630211

or:

as.matrix(dt[, -1]) %*% 0.95 ^ (0:4)

Update: Minor improvement based on comments.

Here's one way that sort of takes advantage of using a data.table:

vs   <- paste0("year_",1:5)
exps <- 1:5 - 1

dt[,NPV:=Reduce(
  `+`,
  mapply(
    function(x,y) x*.95^y,
    .SD,
    exps,
    SIMPLIFY=FALSE)
),.SDcols=vs]

mapply applies the two-argument function to pairs of elements from the two lists, .SD and exps ; and Reduce collapses the result with + . Of course, you can write it on one line instead.

#Using data.frame: df is your data frame and assuming that year 1 indicates 
#the beginning of the year and so discount factor is equal to 0 for the first 
#year and 0.95 for the second year. In the data frame, year1 starts in column2
#2 and year 5 is the last column

 df<-data.frame(dt)
NPV<-rowSums(sapply(2:ncol(df),function(i){df[,i]*0.95^(i-2)}))
> NPV
 [1] 5.1335813 6.3731923 3.9197555 3.5590199 4.7904516 2.0616800 3.6890640 6.1732355 6.8062594 0.5630211

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM