简体   繁体   English

R(dplyr):通过ID计算列中“1”之前的“0”观察值

[英]R (dplyr): count number of “0” observations before “1” in a column by ID

I have a dataset with two variables: ID , repeatvisit , and timeperiod . 我有一个包含两个变量的数据集: IDrepeatvisittimeperiod ID represents the individual that visits the clinic, while referredvisit represents whether that observation has been recommended a referral. ID表示访问诊所的个人,而referredvisit访问表示该观察是否被推荐为推荐。 In other words, referredvisit == 0 means that the individual is not referred to go to another clinic, while referredvisit == 1 represents a patient that is recommended a referral. 换句话说, referredvisit == 0意味着个人不被转介到另一个诊所,而referredvisit == 1代表被推荐转诊的患者。 timeperiod shows the sequence in which the individuals come in. timeperiod显示了个体进入的顺序。

My dataset looks like this: 我的数据集如下所示:

timeperiod <- 1:18
ID <- c("TOM", "TOM", "SALLY", "SALLY", "RICHIE", "TOM", "TOM", "SALLY", "RICHIE", "RICHIE", "RICHIE", "SALLY", "TOM", "TOM", "TOM", "RICHIE", "RICHIE", "RICHIE")
referredvisit <- c(0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0)
df <- cbind(timeperiod, ID, referredvisit)
df <- as.data.frame(df)

What I aim to do is to for every referredvisit == 0 , I would like to count how many rows of "1"s preceded it until it hits the beginning of the column (for the first 0) or until it hits another 0 (for the rest of the 0s) by ID. 我打算做的是为每个referredvisit == 0 ,我想计算它前面有多少行"1"s直到它到达列的开头(对于第一个0)或者直到它到达另一个0(其余的0s)由ID。 I want to create a column that store this count. 我想创建一个存储此计数的列。 My result for the dataset should look like this: 我对数据集的结果应如下所示:

df$result <- c(0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3, 0, 0, 1, 0, 2, 0)

I am actually trying to follow this link , but it can't seem to work as the link assumes that the ID is nicely sorted out. 我实际上是试图按照这个链接 ,但它似乎无法工作,因为链接假定ID很好地整理出来。 I was thinking that perhaps dplyr might help, but can't seem to figure something out too. 我在想,也许dplyr可能有所帮助,但似乎也dplyr Deeply appreciate if anyone could help me in this! 非常感谢有人能帮助我!

Thank you in advance! 先感谢您!

EDIT: For better visualisation, the result will look like this. 编辑:为了更好的可视化,结果将如下所示。 But this is only after I manually sort it by ID. 但这只是在我按ID手动对其进行排序之后。 Cause my original data set will contain thousands of rows, and it is difficult for me to sort the ID manually. 因为我的原始数据集将包含数千行,并且我很难手动对ID进行排序。
在此输入图像描述

The differences of the positions of the zeros minus 1 gives the number of preceding ones and count_ones performs that calculation for a single ID where its argument is assumed to be a logical vector which is TRUE in zero positions. 零的位置减1的差异给出前面的1的数量, count_ones执行单个ID计算,其中假定其参数是在零位置为TRUE的逻辑矢量。 ave is then used to apply it to every ID . 然后使用ave将其应用于每个ID No packages are used. 没有使用包裹。

count_ones <- function(is0) replace(is0, is0, diff(which(c(TRUE, is0))) - 1)    
transform(df, result = ave(referredvisit == 0, ID, FUN = count_ones))

giving: 赠送:

   timeperiod     ID referredvisit result
1           1    TOM             0      0
2           2    TOM             1      0
3           3  SALLY             1      0
4           4  SALLY             1      0
5           5 RICHIE             0      0
6           6    TOM             1      0
7           7    TOM             0      2
8           8  SALLY             1      0
9           9 RICHIE             0      0
10         10 RICHIE             0      0
11         11 RICHIE             1      0
12         12  SALLY             0      3
13         13    TOM             0      0
14         14    TOM             1      0
15         15    TOM             0      1
16         16 RICHIE             1      0
17         17 RICHIE             0      2
18         18 RICHIE             0      0

Here is a tidyverse approach that reproduces your expected result (in column result2 ) 这是一个重复你的预期resulttidyverse方法(在列result2

df %>%
    mutate(referredvisit = as.numeric(as.character(referredvisit))) %>%
    arrange(ID) %>%
    group_by(ID) %>%
    mutate(
        flag = c(0, diff(referredvisit) < 0),
        grp = cumsum(flag)) %>%
    group_by(ID, grp) %>%
    mutate(cms = cumsum(referredvisit)) %>%
    ungroup() %>%
    mutate(result2 = ifelse(flag == 1, lag(cms), 0)) %>%
    select(-cms, -grp, -flag)
## A tibble: 18 x 5
#   timeperiod ID     referredvisit result result2
#   <fct>      <fct>          <dbl>  <dbl>   <dbl>
# 1 5          RICHIE            0.     0.      0.
# 2 9          RICHIE            0.     0.      0.
# 3 10         RICHIE            0.     0.      0.
# 4 11         RICHIE            1.     0.      0.
# 5 16         RICHIE            1.     0.      0.
# 6 17         RICHIE            0.     2.      2.
# 7 18         RICHIE            0.     0.      0.
# 8 3          SALLY             1.     0.      0.
# 9 4          SALLY             1.     0.      0.
#10 8          SALLY             1.     0.      0.
#11 12         SALLY             0.     3.      3.
#12 1          TOM               0.     0.      0.
#13 2          TOM               1.     0.      0.
#14 6          TOM               1.     0.      0.
#15 7          TOM               0.     2.      2.
#16 13         TOM               0.     0.      0.
#17 14         TOM               1.     0.      0.
#18 15         TOM               0.     1.      1.

Update 更新

To keep the original ordering you could do 为了保持原始的顺序,你可以做

df %>%
    rowid_to_column("row") %>%
    mutate(referredvisit = as.numeric(as.character(referredvisit))) %>%
    arrange(ID) %>%
    group_by(ID) %>%
    mutate(
        flag = c(0, diff(referredvisit) < 0),
        grp = cumsum(flag)) %>%
    group_by(ID, grp) %>%
    mutate(cms = cumsum(referredvisit)) %>%
    ungroup() %>%
    mutate(result2 = ifelse(flag == 1, lag(cms), 0)) %>%
    arrange(row) %>%
    select(-cms, -grp, -flag, -row)
## A tibble: 18 x 5
#   timeperiod ID     referredvisit result result2
#   <fct>      <fct>          <dbl>  <dbl>   <dbl>
# 1 1          TOM               0.     0.      0.
# 2 2          TOM               1.     0.      0.
# 3 3          SALLY             1.     0.      0.
# 4 4          SALLY             1.     0.      0.
# 5 5          RICHIE            0.     0.      0.
# 6 6          TOM               1.     0.      0.
# 7 7          TOM               0.     2.      2.
# 8 8          SALLY             1.     0.      0.
# 9 9          RICHIE            0.     0.      0.
#10 10         RICHIE            0.     0.      0.
#11 11         RICHIE            1.     0.      0.
#12 12         SALLY             0.     3.      3.
#13 13         TOM               0.     0.      0.
#14 14         TOM               1.     0.      0.
#15 15         TOM               0.     1.      1.
#16 16         RICHIE            1.     0.      0.
#17 17         RICHIE            0.     2.      2.
#18 18         RICHIE            0.     0.      0.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM