在 dplyr 中使用 if_else 創建一個組內重置為 1 的行數

Question

我正在嘗試根據條件創建行計數，當條件不滿足時，值將重置為 0，而不是繼續計數。 另外，當再次滿足條件時，我試圖將計數重置為 1。 我基於id進行分組，以防止計數溢出到其他橫截面單元。 這是它的外觀示例：

# A tibble: 5 × 4
#  ccode  year    id civ_int
#  <dbl> <dbl> <dbl>   <dbl>
#1    90  1967     1       0
#2    90  1968     1       0
#3    90  1969     1       0
#4    90  1970     1       0
#5    90  1971     1       0

我遇到的問題是，在id內，計數沒有重置為 1。相反，它們在 civ_int 返回 0 時繼續計數。例如，計數可能已經達到 22，在這種情況下，它會重置為 0 civ_int = 1。然而，當 civ_int 返回到 0 時，計數從 23 開始。下面是我如何接近這個的語法以供參考：

merged <- merged %>%
  mutate(civ_int = if_else(
    deaths >= 25, 1, 0
  )) %>%
  group_by(id) %>%
  mutate(low_years = as.numeric(row_number()
  )) %>%
  mutate(low_years = cumsum(if_else(
    civ_int == 0, 1, 0
  ))) %>%
  mutate(low_years = if_else(
    civ_int == 1, 0, low_years
  )) %>%
  ungroup()

這是我使用此代碼遇到的問題的示例：

# A tibble: 20 × 5
#      id  year deaths civ_int low_years
#   <dbl> <dbl>  <dbl>   <dbl>     <dbl>
# 1     1  1983      0       0        17
# 2     1  1984      0       0        18
# 3     1  1985      0       0        19
# 4     1  1986      0       0        20
# 5     1  1987      0       0        21
# 6     1  1988      0       0        22
# 7     1  1989    363       1         0
# 8     1  1990    522       1         0
# 9     1  1991    308       1         0
#10     1  1992    273       1         0
#11     1  1993    132       1         0
#12     1  1994    226       1         0
#13     1  1995     74       1         0
#14     1  1996      2       0        23
#15     1  1997      2       0        24
#16     1  1998      1       0        25
#17     1  1999      0       0        26
#18     1  2000      0       0        27
#19     1  2001      0       0        28
#20     1  2002      2       0        29

low_years應該在 1996 年重置為 1 並從那里向上計數，但這並沒有發生。 有任何想法嗎？

Answer 1

引入額外的分組值可能對您有用

library(dplyr)

df %>%
  mutate(civ_int = if_else(deaths >= 25, 1, 0)) %>%
  group_by(id, grp = cumsum(civ_int != lag(civ_int, default=1))) %>% 
  mutate(low_years = cumsum(civ_int == 0)) %>% 
  ungroup() %>% 
  select(-grp)
# A tibble: 20 × 5
      id  year deaths civ_int low_years
   <int> <int>  <int>   <int>     <int>
 1     1  1983      0       0         1
 2     1  1984      0       0         2
 3     1  1985      0       0         3
 4     1  1986      0       0         4
 5     1  1987      0       0         5
 6     1  1988      0       0         6
 7     1  1989    363       1         0
 8     1  1990    522       1         0
 9     1  1991    308       1         0
10     1  1992    273       1         0
11     1  1993    132       1         0
12     1  1994    226       1         0
13     1  1995     74       1         0
14     1  1996      2       0         1
15     1  1997      2       0         2
16     1  1998      1       0         3
17     1  1999      0       0         4
18     1  2000      0       0         5
19     1  2001      0       0         6
20     1  2002      2       0         7

數據

df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), year = 1983:2002, deaths = c(0L, 
0L, 0L, 0L, 0L, 0L, 363L, 522L, 308L, 273L, 132L, 226L, 74L, 
2L, 2L, 1L, 0L, 0L, 0L, 2L)), class = "data.frame", row.names = c(NA, 
-20L))

Answer 2

使用 data.table：

library(data.table)
setDT(df)[, low_years := cumsum(deaths < 25), .(id, rleid(deaths>=25))]

    id year deaths civ_int low_years
 1:  1 1983      0       0         1
 2:  1 1984      0       0         2
 3:  1 1985      0       0         3
 4:  1 1986      0       0         4
 5:  1 1987      0       0         5
 6:  1 1988      0       0         6
 7:  1 1989    363       1         0
 8:  1 1990    522       1         0
 9:  1 1991    308       1         0
10:  1 1992    273       1         0
11:  1 1993    132       1         0
12:  1 1994    226       1         0
13:  1 1995     74       1         0
14:  1 1996      2       0         1
15:  1 1997      2       0         2
16:  1 1998      1       0         3
17:  1 1999      0       0         4
18:  1 2000      0       0         5
19:  1 2001      0       0         6
20:  1 2002      2       0         7

在 dplyr 中使用 if_else 創建一個組內重置為 1 的行數

問題描述

2 個解決方案

解決方案1
1 已采納 2023-01-18 20:04:25

數據

解決方案2
0 2023-01-18 20:17:54

在 dplyr 中使用 if_else 創建一個組內重置為 1 的行數

問題描述

2 個解決方案

解決方案1 1 已采納 2023-01-18 20:04:25

數據

解決方案2 0 2023-01-18 20:17:54

解決方案1
1 已采納 2023-01-18 20:04:25

解決方案2
0 2023-01-18 20:17:54