R（dplyr）：通过ID计算列中“1”之前的“0”观察值

Question

I have a dataset with two variables: ID , repeatvisit , and timeperiod . 我有一个包含两个变量的数据集： ID ， repeatvisit和timeperiod 。 ID represents the individual that visits the clinic, while referredvisit represents whether that observation has been recommended a referral. ID表示访问诊所的个人，而referredvisit访问表示该观察是否被推荐为推荐。 In other words, referredvisit == 0 means that the individual is not referred to go to another clinic, while referredvisit == 1 represents a patient that is recommended a referral. 换句话说， referredvisit == 0意味着个人不被转介到另一个诊所，而referredvisit == 1代表被推荐转诊的患者。 timeperiod shows the sequence in which the individuals come in. timeperiod显示了个体进入的顺序。

My dataset looks like this: 我的数据集如下所示：

timeperiod <- 1:18
ID <- c("TOM", "TOM", "SALLY", "SALLY", "RICHIE", "TOM", "TOM", "SALLY", "RICHIE", "RICHIE", "RICHIE", "SALLY", "TOM", "TOM", "TOM", "RICHIE", "RICHIE", "RICHIE")
referredvisit <- c(0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0)
df <- cbind(timeperiod, ID, referredvisit)
df <- as.data.frame(df)

What I aim to do is to for every referredvisit == 0 , I would like to count how many rows of "1"s preceded it until it hits the beginning of the column (for the first 0) or until it hits another 0 (for the rest of the 0s) by ID. 我打算做的是为每个referredvisit == 0 ，我想计算它前面有多少行"1"s直到它到达列的开头（对于第一个0）或者直到它到达另一个0（其余的0s）由ID。 I want to create a column that store this count. 我想创建一个存储此计数的列。 My result for the dataset should look like this: 我对数据集的结果应如下所示：

df$result <- c(0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3, 0, 0, 1, 0, 2, 0)

I am actually trying to follow this link , but it can't seem to work as the link assumes that the ID is nicely sorted out. 我实际上是试图按照这个链接，但它似乎无法工作，因为链接假定ID很好地整理出来。 I was thinking that perhaps dplyr might help, but can't seem to figure something out too. 我在想，也许dplyr可能有所帮助，但似乎也dplyr 。 Deeply appreciate if anyone could help me in this! 非常感谢有人能帮助我！

Thank you in advance! 先感谢您！

EDIT: For better visualisation, the result will look like this. 编辑：为了更好的可视化，结果将如下所示。 But this is only after I manually sort it by ID. 但这只是在我按ID手动对其进行排序之后。 Cause my original data set will contain thousands of rows, and it is difficult for me to sort the ID manually. 因为我的原始数据集将包含数千行，并且我很难手动对ID进行排序。

Answer 1

The differences of the positions of the zeros minus 1 gives the number of preceding ones and count_ones performs that calculation for a single ID where its argument is assumed to be a logical vector which is TRUE in zero positions. 零的位置减1的差异给出前面的1的数量， count_ones执行单个ID计算，其中假定其参数是在零位置为TRUE的逻辑矢量。 ave is then used to apply it to every ID . 然后使用ave将其应用于每个ID 。 No packages are used. 没有使用包裹。

count_ones <- function(is0) replace(is0, is0, diff(which(c(TRUE, is0))) - 1)    
transform(df, result = ave(referredvisit == 0, ID, FUN = count_ones))

giving: 赠送：

   timeperiod     ID referredvisit result
1           1    TOM             0      0
2           2    TOM             1      0
3           3  SALLY             1      0
4           4  SALLY             1      0
5           5 RICHIE             0      0
6           6    TOM             1      0
7           7    TOM             0      2
8           8  SALLY             1      0
9           9 RICHIE             0      0
10         10 RICHIE             0      0
11         11 RICHIE             1      0
12         12  SALLY             0      3
13         13    TOM             0      0
14         14    TOM             1      0
15         15    TOM             0      1
16         16 RICHIE             1      0
17         17 RICHIE             0      2
18         18 RICHIE             0      0

Answer 2

Here is a tidyverse approach that reproduces your expected result (in column result2 ) 这是一个重复你的预期result的tidyverse方法（在列result2 ）

df %>%
    mutate(referredvisit = as.numeric(as.character(referredvisit))) %>%
    arrange(ID) %>%
    group_by(ID) %>%
    mutate(
        flag = c(0, diff(referredvisit) < 0),
        grp = cumsum(flag)) %>%
    group_by(ID, grp) %>%
    mutate(cms = cumsum(referredvisit)) %>%
    ungroup() %>%
    mutate(result2 = ifelse(flag == 1, lag(cms), 0)) %>%
    select(-cms, -grp, -flag)
## A tibble: 18 x 5
#   timeperiod ID     referredvisit result result2
#   <fct>      <fct>          <dbl>  <dbl>   <dbl>
# 1 5          RICHIE            0.     0.      0.
# 2 9          RICHIE            0.     0.      0.
# 3 10         RICHIE            0.     0.      0.
# 4 11         RICHIE            1.     0.      0.
# 5 16         RICHIE            1.     0.      0.
# 6 17         RICHIE            0.     2.      2.
# 7 18         RICHIE            0.     0.      0.
# 8 3          SALLY             1.     0.      0.
# 9 4          SALLY             1.     0.      0.
#10 8          SALLY             1.     0.      0.
#11 12         SALLY             0.     3.      3.
#12 1          TOM               0.     0.      0.
#13 2          TOM               1.     0.      0.
#14 6          TOM               1.     0.      0.
#15 7          TOM               0.     2.      2.
#16 13         TOM               0.     0.      0.
#17 14         TOM               1.     0.      0.
#18 15         TOM               0.     1.      1.

Update 更新

To keep the original ordering you could do 为了保持原始的顺序，你可以做

df %>%
    rowid_to_column("row") %>%
    mutate(referredvisit = as.numeric(as.character(referredvisit))) %>%
    arrange(ID) %>%
    group_by(ID) %>%
    mutate(
        flag = c(0, diff(referredvisit) < 0),
        grp = cumsum(flag)) %>%
    group_by(ID, grp) %>%
    mutate(cms = cumsum(referredvisit)) %>%
    ungroup() %>%
    mutate(result2 = ifelse(flag == 1, lag(cms), 0)) %>%
    arrange(row) %>%
    select(-cms, -grp, -flag, -row)
## A tibble: 18 x 5
#   timeperiod ID     referredvisit result result2
#   <fct>      <fct>          <dbl>  <dbl>   <dbl>
# 1 1          TOM               0.     0.      0.
# 2 2          TOM               1.     0.      0.
# 3 3          SALLY             1.     0.      0.
# 4 4          SALLY             1.     0.      0.
# 5 5          RICHIE            0.     0.      0.
# 6 6          TOM               1.     0.      0.
# 7 7          TOM               0.     2.      2.
# 8 8          SALLY             1.     0.      0.
# 9 9          RICHIE            0.     0.      0.
#10 10         RICHIE            0.     0.      0.
#11 11         RICHIE            1.     0.      0.
#12 12         SALLY             0.     3.      3.
#13 13         TOM               0.     0.      0.
#14 14         TOM               1.     0.      0.
#15 15         TOM               0.     1.      1.
#16 16         RICHIE            1.     0.      0.
#17 17         RICHIE            0.     2.      2.
#18 18         RICHIE            0.     0.      0.

R（dplyr）：通过ID计算列中“1”之前的“0”观察值

问题描述

2 个解决方案

解决方案1
4 已采纳 2018-07-19 03:18:02

解决方案2
1 2018-07-19 03:07:49

Update 更新

R（dplyr）：通过ID计算列中“1”之前的“0”观察值

问题描述

2 个解决方案

解决方案1 4 已采纳 2018-07-19 03:18:02

解决方案2 1 2018-07-19 03:07:49

Update 更新

解决方案1
4 已采纳 2018-07-19 03:18:02

解决方案2
1 2018-07-19 03:07:49