計算行之間的差異，但按組保留原始值

Question

我有一個按組累加值的數據框，我需要重新計算回原始值。 函數lag在這里工作得很好，但是我得到的不是NA，而是兩組之間的滯后，而不是序列中的第一個數字。

如何代替NA值或組間差異獲取組中的第一個數字？

我的虛擬數據：

# make example
df <- data.frame(id = rep(1:3, each = 5),
                 hour = rep(1:5, 3),
                 value = sample(1:15))

首先計算累積值，然后將其轉換回行值。 即， value應等於valBack 。 建議mutate(valBack = c(cumsum[1], (cumsum - lag(cumsum))[-1]))只是將第一個（ NA ）值替換為正確的值，但不適用於每個組的第一個數字？

df %>%
  group_by(id) %>%
  dplyr::mutate(cumsum = cumsum(value)) %>% 
  mutate(valBack = c(cumsum[1], (cumsum - lag(cumsum))[-1]))  # skip the first value in a lag vector

結果：

   # A tibble: 15 x 5
# Groups:   id [3]
      id  hour value cumsum valBack
   <int> <int> <int>  <int>   <int>
 1     1     1    10     10      10   # this works
 2     1     2    13     23      13
 3     1     3     8     31       8
 4     1     4     4     35       4
 5     1     5     9     44       9
 6     2     1    12     12     -32    # here the new group start. The number should be 12, instead it is -32??
 7     2     2    14     26      14
 8     2     3     5     31       5
 9     2     4    15     46      15
10     2     5     1     47       1
11     3     1     2      2     -45      # here should be 2 istead of -45
12     3     2     3      5       3
13     3     3     6     11       6
14     3     4    11     22      11
15     3     5     7     29       7

我想進行安全計算以使valBack等於value 。 （當然，在真實數據中，我沒有value列，只有cumsum列）

Answer 1

嘗試：

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(
    cumsum = cumsum(value),
    valBack = c(cumsum[1], (cumsum - lag(cumsum))[-1])
  )

贈送：

# A tibble: 15 x 5
# Groups:   id [3]
      id  hour value cumsum valBack
   <int> <int> <int>  <int>   <int>
 1     1     1    10     10      10
 2     1     2    13     23      13
 3     1     3     8     31       8
 4     1     4     4     35       4
 5     1     5     9     44       9
 6     2     1    12     12      12
 7     2     2    14     26      14
 8     2     3     5     31       5
 9     2     4    15     46      15
10     2     5     1     47       1
11     3     1     2      2       2
12     3     2     3      5       3
13     3     3     6     11       6
14     3     4    11     22      11
15     3     5     7     29       7

Answer 2

雖然可接受的答案有效，但它比需要的要復雜得多。 如果您查看lag函數，您會發現它具有不同的參數

dplyr::lag(x, n = 1L, default = NA, order_by = NULL, ...)

在這里我們可以使用default並將其設置為0以獲得所需的輸出。 往下看：

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(cumsum  = cumsum(value), 
         rawdata = cumsum - lag(cumsum, default = 0))

#> # A tibble: 15 x 5
#> # Groups:   id [3]
#>       id  hour value cumsum rawdata
#>    <int> <int> <int>  <int>   <dbl>
#>  1     1     1     2      2       2
#>  2     1     2     1      3       1
#>  3     1     3    13     16      13
#>  4     1     4    15     31      15
#>  5     1     5    10     41      10
#>  6     2     1     3      3       3
#>  7     2     2     8     11       8
#>  8     2     3     4     15       4
#>  9     2     4    12     27      12
#> 10     2     5    11     38      11
#> 11     3     1    14     14      14
#> 12     3     2     6     20       6
#> 13     3     3     5     25       5
#> 14     3     4     7     32       7
#> 15     3     5     9     41       9

計算行之間的差異，但按組保留原始值

問題描述

2 個解決方案

解決方案1
1 已采納 2019-07-20 08:57:37

解決方案2
1 2019-07-20 17:19:16

計算行之間的差異，但按組保留原始值

問題描述

2 個解決方案

解決方案1 1 已采納 2019-07-20 08:57:37

解決方案2 1 2019-07-20 17:19:16

解決方案1
1 已采納 2019-07-20 08:57:37

解決方案2
1 2019-07-20 17:19:16