I have a data table with 2x data columns and two id columns. The id columns are years with values (X2010, X2015, X2020, etc) and countries (cty1, cty2, etc). For each country, the first set of data columns (f1, f2, f3, etc) has a value only in the first row (X2010) and NAs in the rest of the rows. The second set of columns (x.f1, x.f2, x.f3, etc) has NA in the first row and different values in the rest of the rows. I want to replace the NAs in the first set of columns with the following recursive structure for each country.
f1.X2015 = f1.X2010 * x.f1.X2015
f1.X2020 = f1.X2015 * x.f1.X2020
...
I have tried the following
foods <- c("f1", "f2", "f3")
x.foods <- c("x.f1", "x.f"2, "x.f3")
res <- c("res.f1", "res.f2", "res.f3")
f.cumprod <- function(x,y) {return(first(x) * cumprod(replace(y), 1,1) * NA^(.I= 1))}
Here's the data table structure that I thought would work to generate values for the res columns.
DT[,(res) := mapply(FUN = f.cumprod, x = .SD, y = list(x.foods)), .SDcols = foods, by = c("cty")]
Here's a simplified version for just one country
set.seed(24)
dt <- data.table(cty = c(rep("cty1", 5), rep("cty2", 5), rep("cty3", 5)), year = rep(c("X2010", "X2015", "X2020", "X2025", "X2030"), 3),
f1 = rep(c(0.9883415, rep(NA, 4)), 3), f2 = rep(c(1.0685221, rep(NA, 4)), 3), f3 = rep(c(1.0664189, rep(NA, 4)), 3),
x.f1 = rep(c(NA, rep(rnorm(4))), 3), x.f2 = rep(c(NA, rep(rnorm(4))), 3), x.f3 = rep(c(NA, rep(rnorm(4))), 3))
And a kludgy, and slowwww, way to get the result for one of the food items, f1.
dt.subset <- dt[, c("f1", "x.f1"), with = FALSE]
for (i in 2:nrow(dt.subset)) {
dt.subset$f1[i] <- dt.subset$f1[i - 1] * dt.subset$x.f1[i]
}
Since I want to do this for about 170 countries and 20 food items (and with 4 scenarios), I'm hoping there is a solution along the lines of the DT code above.
If we are looking for a recursive function (for a single 'cty')
dt.subset[, f1 := Reduce(`*`, x.f1[-1], init = f1[1], accumulate = TRUE)]
Or with accumulate
from purrr
library(purrr)
dt.subset[, f1 := accumulate(x.f1[-1], ~ .x * .y, .init = f1[1])]
Based on the OP's data 'dt', we could melt
into 'long' forma, then apply the function with accumulate
, dcast
back to 'wide'
out <- dcast(melt(dt, measure = patterns("^f\\d+", "^x\\.f\\d+"))[,
accumulate(value2[-1], ~ .x * .y, .init = value1[1]), .(variable, cty)],
cty + rowid(variable) ~ variable, value.var = "V1")
nm1 <- grep("^f\\d+$", names(dt), value = TRUE)
setnames(out, -(1:2), nm1)
and then set
the columns of interest with the new values
for(j in nm1) set(dt, i= NULL, j= j, value = out[[j]])
dt
# cty year f1 f2 f3 x.f1 x.f2 x.f3
# 1: cty1 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
# 2: cty1 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
# 3: cty1 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
# 4: cty1 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
# 5: cty1 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
# 6: cty2 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
# 7: cty2 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
# 8: cty2 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
# 9: cty2 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
#10: cty2 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
#11: cty3 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
#12: cty3 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
#13: cty3 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
#14: cty3 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
#15: cty3 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
-checking with the values of 'dt.subset' for the first 'cty' after applying the OP's function
dt.subset
# f1 x.f1
#1: 0.98834150 NA
#2: -0.53951661 -0.5458808
#3: -0.28949668 0.5365853
#4: -0.12147951 0.4196231
#5: 0.07089875 -0.5836272
Or we can do this with Map
dt[, (foods) := Map(function(x, y) accumulate(y[-1], `*`, .init = x[1]),
mget(foods), mget(x.foods)), by = .(cty)]
dt
# cty year f1 f2 f3 x.f1 x.f2 x.f3
# 1: cty1 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
# 2: cty1 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
# 3: cty1 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
# 4: cty1 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
# 5: cty1 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
# 6: cty2 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
# 7: cty2 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
# 8: cty2 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
# 9: cty2 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
#10: cty2 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
#11: cty3 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
#12: cty3 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
#13: cty3 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
#14: cty3 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
#15: cty3 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
Or if we are using cumprod
(there were some errors in the OP's f.cumprod
function). It could be changed to
f.cumprod <- function(x, y) cumprod(c(x[1], y[-1]))
dt[, (foods) := Map(f.cumprod, mget(foods), mget(x.foods)), by = .(cty)]
dt
# cty year f1 f2 f3 x.f1 x.f2 x.f3
# 1: cty1 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
# 2: cty1 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
# 3: cty1 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
# 4: cty1 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
# 5: cty1 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
# 6: cty2 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
# 7: cty2 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
# 8: cty2 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
# 9: cty2 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
#10: cty2 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
#11: cty3 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
#12: cty3 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
#13: cty3 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
#14: cty3 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
#15: cty3 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
NOTE: The values are the same for each 'cty' because the example dataset values are the same for each 'cty'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.