R how to use mapply with a data table and two lists of column names

Question

I have a data table with 2x data columns and two id columns. The id columns are years with values (X2010, X2015, X2020, etc) and countries (cty1, cty2, etc). For each country, the first set of data columns (f1, f2, f3, etc) has a value only in the first row (X2010) and NAs in the rest of the rows. The second set of columns (x.f1, x.f2, x.f3, etc) has NA in the first row and different values in the rest of the rows. I want to replace the NAs in the first set of columns with the following recursive structure for each country.

f1.X2015 = f1.X2010 * x.f1.X2015
f1.X2020 = f1.X2015 * x.f1.X2020
...

I have tried the following

foods <- c("f1", "f2", "f3")
x.foods <- c("x.f1", "x.f"2, "x.f3")
res <- c("res.f1", "res.f2", "res.f3")
f.cumprod <- function(x,y) {return(first(x) * cumprod(replace(y), 1,1) * NA^(.I= 1))}

Here's the data table structure that I thought would work to generate values for the res columns.

DT[,(res) := mapply(FUN = f.cumprod, x = .SD, y = list(x.foods)), .SDcols = foods, by = c("cty")]

Here's a simplified version for just one country

set.seed(24)

 dt <- data.table(cty = c(rep("cty1", 5), rep("cty2", 5), rep("cty3", 5)), year = rep(c("X2010", "X2015", "X2020", "X2025", "X2030"), 3), 
             f1 = rep(c(0.9883415, rep(NA, 4)), 3), f2 = rep(c(1.0685221, rep(NA, 4)), 3), f3 = rep(c(1.0664189, rep(NA, 4)), 3), 
           x.f1 = rep(c(NA, rep(rnorm(4))), 3),   x.f2 = rep(c(NA, rep(rnorm(4))), 3),   x.f3 = rep(c(NA, rep(rnorm(4))), 3))

And a kludgy, and slowwww, way to get the result for one of the food items, f1.

dt.subset <- dt[, c("f1", "x.f1"), with = FALSE]

for (i in 2:nrow(dt.subset)) {
  dt.subset$f1[i] <- dt.subset$f1[i - 1] * dt.subset$x.f1[i]
}

Since I want to do this for about 170 countries and 20 food items (and with 4 scenarios), I'm hoping there is a solution along the lines of the DT code above.

Answer 1

If we are looking for a recursive function (for a single 'cty')

dt.subset[, f1 := Reduce(`*`, x.f1[-1], init = f1[1], accumulate = TRUE)]

Or with accumulate from purrr

library(purrr)
dt.subset[, f1 := accumulate(x.f1[-1], ~ .x * .y, .init = f1[1])]

Based on the OP's data 'dt', we could melt into 'long' forma, then apply the function with accumulate , dcast back to 'wide'

out <- dcast(melt(dt, measure = patterns("^f\\d+", "^x\\.f\\d+"))[, 
  accumulate(value2[-1], ~ .x * .y, .init = value1[1]), .(variable, cty)], 
  cty + rowid(variable) ~ variable, value.var = "V1")
nm1 <- grep("^f\\d+$", names(dt), value = TRUE)
setnames(out, -(1:2), nm1)

and then set the columns of interest with the new values

for(j in nm1) set(dt, i= NULL, j= j, value = out[[j]])
dt
#     cty  year          f1         f2           f3       x.f1       x.f2         x.f3
# 1: cty1 X2010  0.98834150  1.0685221  1.066418900         NA         NA           NA
# 2: cty1 X2015 -0.53951661  0.9055298 -0.904717849 -0.5458808  0.8474600 -0.848370044
# 3: cty1 X2020 -0.28949668  0.2408908 -0.002091656  0.5365853  0.2660220  0.002311942
# 4: cty1 X2025 -0.12147951  0.1070965  0.002754518  0.4196231  0.4445853 -1.316908124
# 5: cty1 X2030  0.07089875 -0.0499600  0.001647943 -0.5836272 -0.4664951  0.598269113
# 6: cty2 X2010  0.98834150  1.0685221  1.066418900         NA         NA           NA
# 7: cty2 X2015 -0.53951661  0.9055298 -0.904717849 -0.5458808  0.8474600 -0.848370044
# 8: cty2 X2020 -0.28949668  0.2408908 -0.002091656  0.5365853  0.2660220  0.002311942
# 9: cty2 X2025 -0.12147951  0.1070965  0.002754518  0.4196231  0.4445853 -1.316908124
#10: cty2 X2030  0.07089875 -0.0499600  0.001647943 -0.5836272 -0.4664951  0.598269113
#11: cty3 X2010  0.98834150  1.0685221  1.066418900         NA         NA           NA
#12: cty3 X2015 -0.53951661  0.9055298 -0.904717849 -0.5458808  0.8474600 -0.848370044
#13: cty3 X2020 -0.28949668  0.2408908 -0.002091656  0.5365853  0.2660220  0.002311942
#14: cty3 X2025 -0.12147951  0.1070965  0.002754518  0.4196231  0.4445853 -1.316908124
#15: cty3 X2030  0.07089875 -0.0499600  0.001647943 -0.5836272 -0.4664951  0.598269113

-checking with the values of 'dt.subset' for the first 'cty' after applying the OP's function

dt.subset
#            f1       x.f1
#1:  0.98834150         NA
#2: -0.53951661 -0.5458808
#3: -0.28949668  0.5365853
#4: -0.12147951  0.4196231
#5:  0.07089875 -0.5836272

Or we can do this with Map

dt[, (foods) := Map(function(x, y) accumulate(y[-1], `*`, .init = x[1]),
           mget(foods), mget(x.foods)), by = .(cty)]
dt
#     cty  year          f1         f2           f3       x.f1       x.f2         x.f3
# 1: cty1 X2010  0.98834150  1.0685221  1.066418900         NA         NA           NA
# 2: cty1 X2015 -0.53951661  0.9055298 -0.904717849 -0.5458808  0.8474600 -0.848370044
# 3: cty1 X2020 -0.28949668  0.2408908 -0.002091656  0.5365853  0.2660220  0.002311942
# 4: cty1 X2025 -0.12147951  0.1070965  0.002754518  0.4196231  0.4445853 -1.316908124
# 5: cty1 X2030  0.07089875 -0.0499600  0.001647943 -0.5836272 -0.4664951  0.598269113
# 6: cty2 X2010  0.98834150  1.0685221  1.066418900         NA         NA           NA
# 7: cty2 X2015 -0.53951661  0.9055298 -0.904717849 -0.5458808  0.8474600 -0.848370044
# 8: cty2 X2020 -0.28949668  0.2408908 -0.002091656  0.5365853  0.2660220  0.002311942
# 9: cty2 X2025 -0.12147951  0.1070965  0.002754518  0.4196231  0.4445853 -1.316908124
#10: cty2 X2030  0.07089875 -0.0499600  0.001647943 -0.5836272 -0.4664951  0.598269113
#11: cty3 X2010  0.98834150  1.0685221  1.066418900         NA         NA           NA
#12: cty3 X2015 -0.53951661  0.9055298 -0.904717849 -0.5458808  0.8474600 -0.848370044
#13: cty3 X2020 -0.28949668  0.2408908 -0.002091656  0.5365853  0.2660220  0.002311942
#14: cty3 X2025 -0.12147951  0.1070965  0.002754518  0.4196231  0.4445853 -1.316908124
#15: cty3 X2030  0.07089875 -0.0499600  0.001647943 -0.5836272 -0.4664951  0.598269113

Or if we are using cumprod (there were some errors in the OP's f.cumprod function). It could be changed to

f.cumprod <- function(x, y)  cumprod(c(x[1], y[-1]))
dt[, (foods) := Map(f.cumprod,  mget(foods), mget(x.foods)), by = .(cty)]
dt
#     cty  year          f1         f2           f3       x.f1       x.f2         x.f3
# 1: cty1 X2010  0.98834150  1.0685221  1.066418900         NA         NA           NA
# 2: cty1 X2015 -0.53951661  0.9055298 -0.904717849 -0.5458808  0.8474600 -0.848370044
# 3: cty1 X2020 -0.28949668  0.2408908 -0.002091656  0.5365853  0.2660220  0.002311942
# 4: cty1 X2025 -0.12147951  0.1070965  0.002754518  0.4196231  0.4445853 -1.316908124
# 5: cty1 X2030  0.07089875 -0.0499600  0.001647943 -0.5836272 -0.4664951  0.598269113
# 6: cty2 X2010  0.98834150  1.0685221  1.066418900         NA         NA           NA
# 7: cty2 X2015 -0.53951661  0.9055298 -0.904717849 -0.5458808  0.8474600 -0.848370044
# 8: cty2 X2020 -0.28949668  0.2408908 -0.002091656  0.5365853  0.2660220  0.002311942
# 9: cty2 X2025 -0.12147951  0.1070965  0.002754518  0.4196231  0.4445853 -1.316908124
#10: cty2 X2030  0.07089875 -0.0499600  0.001647943 -0.5836272 -0.4664951  0.598269113
#11: cty3 X2010  0.98834150  1.0685221  1.066418900         NA         NA           NA
#12: cty3 X2015 -0.53951661  0.9055298 -0.904717849 -0.5458808  0.8474600 -0.848370044
#13: cty3 X2020 -0.28949668  0.2408908 -0.002091656  0.5365853  0.2660220  0.002311942
#14: cty3 X2025 -0.12147951  0.1070965  0.002754518  0.4196231  0.4445853 -1.316908124
#15: cty3 X2030  0.07089875 -0.0499600  0.001647943 -0.5836272 -0.4664951  0.598269113

NOTE: The values are the same for each 'cty' because the example dataset values are the same for each 'cty'

R how to use mapply with a data table and two lists of column names

Question

1 answers

solution1
2 ACCPTED 2018-04-29 03:27:32

R how to use mapply with a data table and two lists of column names

Question

1 answers

solution1 2 ACCPTED 2018-04-29 03:27:32

solution1
2 ACCPTED 2018-04-29 03:27:32