在 mutate() 函數中使用 lag() 的問題 (tidyverse)

Question

我正在嘗試將另一列添加到數據框中，其中新列是新列中前一個值和當前行值的函數。 我試圖去除不相關的代碼並堅持使用簡單的數字，以便我可以理解這里的答案。 給定以下數據框：

下一列 (y) 將 5 添加到 x 並添加前一行的 y 值。 第一行中沒有 y 的先前值，因此我將其定義為 0。因此 y 的第一行值將是 x+5+0 或 1+5+0 或 6。第二行將是 x+5+ y(from 1st row) or 2+5+6 or 13. 數據框應該是這樣的：

我用 case_when() 和 lag() 函數試過這個：

test_df <- data.frame(x = 1:5)
test_df %>% mutate(y = case_when(x==1 ~ 6,
+                                    x>1 ~ x+5+lag(y)))

錯誤： mutate()列y 。 ℹ y = case_when(x == 1 ~ 6, x > 1 ~ x + 5 + lag(y)) 。 未找到 x 對象 'y' 運行rlang::last_error()以查看錯誤發生的位置。

我以為 y 是在計算第一行時定義的。 有一個更好的方法嗎？ 謝謝！

Answer 1

你根本不需要lag 。 只需一個cumsum就足夠了。

test_df %>% mutate(y = cumsum(x + 5))

#>   x  y
#> 1 1  6
#> 2 2 13
#> 3 3 21
#> 4 4 30
#> 5 5 40

數據

test_df <- data.frame(x = 1:5)

Answer 2

我們也可以在這里使用purrr::accumulate ：

library(purrr)

df %>% mutate(y = accumulate(x+5, ~.x + .y))

  x  y
1 1  6
2 2 13
3 3 21
4 4 30
5 5 40

我們還可以將accumulate與常規基礎 R 合成accumulate一起使用：

df %>% mutate(y = accumulate(x+5, function(x, y) {x + y}))