在R（dplyr）中重置的條件運行計數（累積總和）

Question

我正在嘗試計算一個以其他變量為條件的運行計數（即累計總和），並且可以針對另一個變量的特定值進行重置。 我正在R中工作，如果可能的話，希望使用基於dplyr的解決方案。

我想根據以下算法為運行計數（ cumulative創建一個變量：

在id和age組合中計算運行計數（ cumulative ）
隨后的每次trial運行計數（ cumulative ）加1（ accuracy = 0 ， block = 2且condition = 1
將每次trial運行計數（ cumulative ）重置為0，其中accuracy = 1 ， block = 2 ， condition = 1 ，下一個增量恢復為1（而不是先前的數字）
對於block != 2或condition != 1每個trial ，將運行計數（ cumulative ）保留為NA

這是一個最小的工作示例：

mydata <- data.frame(id = c(1,1,1,1,1,1,1,1,1,1,1),
                 age = c(1,1,1,1,1,1,1,1,1,1,2),
                 block = c(1,1,2,2,2,2,2,2,2,2,2),
                 trial = c(1,2,1,2,3,4,5,6,7,8,1),
                 condition = c(1,1,1,1,1,2,1,1,1,1,1),
                 accuracy = c(0,0,0,0,0,0,0,1,0,0,0)
)

id  age block   trial   condition   accuracy
1   1   1       1       1           0
1   1   1       2       1           0
1   1   2       1       1           0
1   1   2       2       1           0
1   1   2       3       1           0
1   1   2       4       2           0
1   1   2       5       1           0
1   1   2       6       1           1
1   1   2       7       1           0
1   1   2       8       1           0
1   2   2       1       1           0

預期的輸出是：

id  age block   trial   condition   accuracy    cumulative
1   1   1       1       1           0           NA
1   1   1       2       1           0           NA
1   1   2       1       1           0           1
1   1   2       2       1           0           2
1   1   2       3       1           0           3
1   1   2       4       2           0           NA
1   1   2       5       1           0           4
1   1   2       6       1           1           0
1   1   2       7       1           0           1
1   1   2       8       1           0           2
1   2   2       1       1           0           1

Answer 1

我們可以使用case_when根據我們的條件分配所需的值。 然后，我們添加一個額外的group_by條件，使用cumsum在temp cumsum 0時切換值。在最后的mutate步驟中，我們replace temp NA值臨時replace為0，然后對其進行cumsum ，然后將NA值再次放回原處最終輸出。

library(dplyr)

mydata %>%
    group_by(id, age) %>%
    mutate(temp = case_when(accuracy == 0 & block == 2 & condition == 1 ~ 1, 
                            accuracy == 1 & block == 2 & condition == 1 ~ 0, 
                            TRUE ~ NA_real_)) %>%
    ungroup() %>%
    group_by(id, age, group = cumsum(replace(temp == 0, is.na(temp), 0))) %>%
    mutate(cumulative = replace(cumsum(replace(temp, is.na(temp), 0)),
                          is.na(temp), NA)) %>%
    select(-temp, -group)


#    group    id   age block trial condition accuracy cumulative
#   <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>      <dbl>
# 1     0     1     1     1     1         1        0         NA
# 2     0     1     1     1     2         1        0         NA
# 3     0     1     1     2     1         1        0          1
# 4     0     1     1     2     2         1        0          2
# 5     0     1     1     2     3         1        0          3
# 6     0     1     1     2     4         2        0         NA
# 7     0     1     1     2     5         1        0          4
# 8     1     1     1     2     6         1        1          0
# 9     1     1     1     2     7         1        0          1
#10     1     1     1     2     8         1        0          2
#11     1     1     2     2     1         1        0          1

Answer 2

這是使用data.table的選項。 創建一個二進制列，其基礎是將paste d值“ accuracy”，“ block”，“ condition”與自定義值match ，並按二進制列的run-length-id（“ ind”），“ id”分組'和'age'，獲取'ind'的累加總和，並將其分配（ := ）到新列中（'Cumulative'）

library(data.table)
setDT(mydata)[, ind := match(do.call(paste0, .SD), c("121", "021")) - 1,
    .SDcols = c("accuracy", "block", "condition")
     ][, Cumulative := cumsum(ind), .(rleid(ind), id, age)
      ][, ind := NULL][]
#    id age block trial condition accuracy Cumulative
# 1:  1   1     1     1         1        0         NA
# 2:  1   1     1     2         1        0         NA
# 3:  1   1     2     1         1        0          1
# 4:  1   1     2     2         1        0          2
# 5:  1   1     2     3         1        0          3
# 6:  1   1     2     4         2        0         NA
# 7:  1   1     2     5         1        1          0
# 8:  1   1     2     6         1        0          1
# 9:  1   1     2     7         1        0          2
#10:  1   2     2     1         1        0          1

在R（dplyr）中重置的條件運行計數（累積總和）

問題描述

2 個解決方案

解決方案1
2 已采納 2018-10-24 03:35:02

解決方案2
2 2018-10-24 04:01:50

在R（dplyr）中重置的條件運行計數（累積總和）

問題描述

2 個解決方案

解決方案1 2 已采納 2018-10-24 03:35:02

解決方案2 2 2018-10-24 04:01:50

解決方案1
2 已采納 2018-10-24 03:35:02

解決方案2
2 2018-10-24 04:01:50