简体   繁体   English

R 根据同一列中的前一个单元格值生成一个单元格值

[英]R generating a cell value based on the previous cell value in that same column

I'm currently in the process of running simulations based on certain data.我目前正在根据某些数据运行模拟。 The endgame is to generate a column where the first value is based on one formula, and then the second, third and fourth values are based on the previous value.最后是生成一个列,其中第一个值基于一个公式,然后第二个、第三个和第四个值基于前一个值。 (eg entry n°2 is dependent on n°1, n°3 on n°2) I've solved this by running the mutate function 3 times over. (例如,条目 n°2 取决于 n°1,n°3 取决于 n°2)我已经通过运行 mutate function 3 次解决了这个问题。 However, with tidiness in mind, I would like to either have a short loop or use one of the apply functions to execute all 3 repeats at once.但是,考虑到整洁,我想要么有一个短循环,要么使用其中一个应用函数一次执行所有 3 次重复。 Any suggestions?有什么建议么?

Here's an example:这是一个例子:

sampleframe <- data.frame("value1" = c(15,18,22,19),
                          "value2" = c(12,14,13,12),
                          "parameter" = c(0.8,NA,NA,NA))

sampleframe <- sampleframe %>%
  mutate("value3" = value2 * parameter)

This generatese the dataframe with the first row of the "value3" column, based on one formula.这会根据一个公式生成具有“value3”列的第一行的 dataframe。 Then I would like to generate the last 3 rows.然后我想生成最后 3 行。 I run this line:我运行这条线:

sampleframe <- sampleframe %>%
  mutate(`value3`= ifelse(is.na(value3) == FALSE,  value3,lag(value3) * value2))

which generates the second row value whilst retaining the first row value.它生成第二行值,同时保留第一行值。 I then have to run the same command an extra two times to get the last 2 rows to fill.然后我必须额外运行两次相同的命令才能填充最后 2 行。 It works in the sense that it preserves previous values while always generating the next one, but it seems remarkably inefficient.它的工作原理是在始终生成下一个值的同时保留以前的值,但它似乎非常低效。 Back to my question, is there a better way to do this?回到我的问题,有没有更好的方法来做到这一点? (I assume there is) (我假设有)

Edit: Given the purrr solution, I ran into the following problem when expanding my above example.编辑:鉴于 purrr 解决方案,我在扩展上述示例时遇到了以下问题。 If I want to add a constant in the expression, the solution doesn't work anymore:如果我想在表达式中添加一个常量,则该解决方案不再起作用:

sampleframe <- sampleframe %>%
  mutate(`value3`= ifelse(is.na(value3) == FALSE,  value3,lag(value3) * value2 + value 1))

In the purr solution:在咕噜声解决方案中:

sampleframe %>% 
  mutate(
    value3 = if_else(row_number() == 1, value2*parameter, value2),
    value3 = accumulate(value3, prod)
  )

Each term in value3 will multiply value 2. The problem is that adding the constant after value 2: value3 中的每一项都将乘以 value 2。问题是在 value 2 之后添加常量:

sampleframe %>% 
  mutate(
    value3 = if_else(row_number() == 1, value2*parameter, value2 + value1),
    value3 = accumulate(value3, prod)
  )

Doesn't yield the desired result, since I don't want value1 to be multiplied by value2.不会产生预期的结果,因为我不希望 value1 乘以 value2。 Adding it in the second term:在第二个任期内添加它:

sampleframe %>% 
  mutate(
    value3 = if_else(row_number() == 1, value2*parameter, value2),
    value3 = accumulate(value3, prod) + value1
  )

also doesn't work, because it adds value1 as a block at the very end, meaning that line 1 and 2 are computed correctly, but 3 and 4 are not.也不起作用,因为它在最后添加了 value1 作为块,这意味着第 1 行和第 2 行计算正确,但第 3 和第 4 行没有。 I tried any way I could think of to make this command work, but I'm not familiar enough with the purrr package to find a fix.我尝试了任何我能想到的方法来使这个命令工作,但我对 purrr package 不够熟悉,无法找到修复程序。 Any ideas?有任何想法吗?

Limiting my answer to your current approach, you can make things more efficient by using a for loop:限制我对您当前方法的回答,您可以通过使用 for 循环来提高效率:

number_iterations = 3

# setup
sampleframe <- data.frame("value1" = c(15,18,22,19),
                          "value2" = c(12,14,13,12),
                          "parameter" = c(0.8,NA,NA,NA))

sampleframe <- sampleframe %>%
  mutate("value3" = value2 * parameter)

# run
for(ii = 1:number_iterations){
  sampleframe <- sampleframe %>%
    mutate(`value3`= ifelse(is.na(value3) == FALSE,  value3,lag(value3) * value2))
}

The four loop will handle the running of your code as many times at you spcify in number_iterations .这四个循环将在您指定number_iterations时多次处理您的代码运行。

However, I would usually recommend operations like mutate for working on an entire column at once, rather than updating one value at a time.但是,我通常会推荐像mutate这样的操作来一次处理整个列,而不是一次更新一个值。 So you will likely get further improvements in efficiency from investigating different data structures and solution approaches.因此,您可能会通过研究不同的数据结构和解决方案方法来进一步提高效率。

You can use accumulate() from {purrr} and multiply the numbers sequentially.您可以使用{purrr}中的accumulate()并按顺序将数字相乘。

sampleframe %>% 
  mutate(
    value3 = if_else(row_number() == 1, value2*parameter, value2),
    value3 = accumulate(value3, prod)
  )


#   value1 value2 parameter  value3
# 1     15     12       0.8     9.6
# 2     18     14        NA   134.4
# 3     22     13        NA  1747.2
# 4     19     12        NA 20966.4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM