I'm currently in the process of running simulations based on certain data. The endgame is to generate a column where the first value is based on one formula, and then the second, third and fourth values are based on the previous value. (eg entry n°2 is dependent on n°1, n°3 on n°2) I've solved this by running the mutate function 3 times over. However, with tidiness in mind, I would like to either have a short loop or use one of the apply functions to execute all 3 repeats at once. Any suggestions?
Here's an example:
sampleframe <- data.frame("value1" = c(15,18,22,19),
"value2" = c(12,14,13,12),
"parameter" = c(0.8,NA,NA,NA))
sampleframe <- sampleframe %>%
mutate("value3" = value2 * parameter)
This generatese the dataframe with the first row of the "value3" column, based on one formula. Then I would like to generate the last 3 rows. I run this line:
sampleframe <- sampleframe %>%
mutate(`value3`= ifelse(is.na(value3) == FALSE, value3,lag(value3) * value2))
which generates the second row value whilst retaining the first row value. I then have to run the same command an extra two times to get the last 2 rows to fill. It works in the sense that it preserves previous values while always generating the next one, but it seems remarkably inefficient. Back to my question, is there a better way to do this? (I assume there is)
Edit: Given the purrr solution, I ran into the following problem when expanding my above example. If I want to add a constant in the expression, the solution doesn't work anymore:
sampleframe <- sampleframe %>%
mutate(`value3`= ifelse(is.na(value3) == FALSE, value3,lag(value3) * value2 + value 1))
In the purr solution:
sampleframe %>%
mutate(
value3 = if_else(row_number() == 1, value2*parameter, value2),
value3 = accumulate(value3, prod)
)
Each term in value3 will multiply value 2. The problem is that adding the constant after value 2:
sampleframe %>%
mutate(
value3 = if_else(row_number() == 1, value2*parameter, value2 + value1),
value3 = accumulate(value3, prod)
)
Doesn't yield the desired result, since I don't want value1 to be multiplied by value2. Adding it in the second term:
sampleframe %>%
mutate(
value3 = if_else(row_number() == 1, value2*parameter, value2),
value3 = accumulate(value3, prod) + value1
)
also doesn't work, because it adds value1 as a block at the very end, meaning that line 1 and 2 are computed correctly, but 3 and 4 are not. I tried any way I could think of to make this command work, but I'm not familiar enough with the purrr package to find a fix. Any ideas?
Limiting my answer to your current approach, you can make things more efficient by using a for loop:
number_iterations = 3
# setup
sampleframe <- data.frame("value1" = c(15,18,22,19),
"value2" = c(12,14,13,12),
"parameter" = c(0.8,NA,NA,NA))
sampleframe <- sampleframe %>%
mutate("value3" = value2 * parameter)
# run
for(ii = 1:number_iterations){
sampleframe <- sampleframe %>%
mutate(`value3`= ifelse(is.na(value3) == FALSE, value3,lag(value3) * value2))
}
The four loop will handle the running of your code as many times at you spcify in number_iterations
.
However, I would usually recommend operations like mutate
for working on an entire column at once, rather than updating one value at a time. So you will likely get further improvements in efficiency from investigating different data structures and solution approaches.
You can use accumulate()
from {purrr}
and multiply the numbers sequentially.
sampleframe %>%
mutate(
value3 = if_else(row_number() == 1, value2*parameter, value2),
value3 = accumulate(value3, prod)
)
# value1 value2 parameter value3
# 1 15 12 0.8 9.6
# 2 18 14 NA 134.4
# 3 22 13 NA 1747.2
# 4 19 12 NA 20966.4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.