简体   繁体   English

R - 基于 group_id 和条件语句创建新索引

[英]R - Create new indices based on group_id and conditional statement

I am working with a data frame (lets call is MyData) looking like below.我正在使用如下所示的数据框(我们称之为 MyData)。 What I want to do is to group by PatientKey and create a new Id called NewID.我想要做的是按 PatientKey 分组并创建一个名为 NewID 的新 Id。 Every time for the same PatientKey that the TimeBetweenTests is > 14, the new Id should increase by 1, and stay on that particular new value until a new PatientKey shows up OR for the same patientKey a new TimeBetweenTests > 14 shows up.每次对于 TimeBetweenTests > 14 的相同 PatientKey,新的 Id 都应该增加 1,并保持在那个特定的新值上,直到出现新的 PatientKey 或者对于相同的 PatientKey 出现一个新的 TimeBetweenTests > 14。

PatientKey             TimeBetweenTests     NewId        
1                      0                    NewId should be 1 (first patient)
1                      0                    NewId should be 1
1                      1                    NewId should be 1                                                                
1                      2                    NewId should be 1
2                      3                    NewId should be 2 (new patient)                                                                          
3                      4                    NewId should be 3 (new patient)      
3                      16                   NewId should be 4 (same patient but TimeBetweenTests > 14)                                                                                              
3                      80                   NewId should be 5 (same patient but TimeBetweenTests > 14)
4                      3                    NewId should be 6 (new patient)
4                      0                    NewId should be 6 (new patient)                                                                            
4                      90                   NewId should be 7 (same patient but TimeBetweenTests > 14)        
4                      110                  NewId should be 8 (same patient but TimeBetweenTests > 14) 
5                      3                    NewId should be 9 (new patient)
5                      3                    NewId should be 9
5                      3                    NewId should be 9

etc    
                                                                       

I have tried using dplyr for this but the problem is that the subsequent values does not change when I try code similar to:我曾尝试为此使用 dplyr,但问题是当我尝试类似以下的代码时,后续值不会改变:

MyData <- MyData %>% group_by(PatientKey) %>% mutate(NewId = ifelse(TimeBetweenTests > 14, lag(NewId), NewId)) MyData <- MyData %>% group_by(PatientKey) %>% mutate(NewId = ifelse(TimeBetweenTests > 14, lag(NewId), NewId))

Anyone has a convenient dplyr or data.table solution for this, alternatively a for loop approach.任何人都有一个方便的 dplyr 或 data.table 解决方案,或者 for 循环方法。

Try this尝试这个

library(dplyr)
df %>% mutate(NewID = cumsum(lag(PatientKey, default = 0) != PatientKey | TimeBetweenTests > 14)

Output输出

   PatientKey TimeBetweenTests NewID
        <dbl>            <dbl> <int>
 1          1                0     1
 2          1                0     1
 3          1                1     1
 4          1                2     1
 5          2                3     2
 6          3                4     3
 7          3               16     4
 8          3               80     5
 9          4                3     6
10          4                0     6
11          4               90     7
12          4              110     8
13          5                3     9
14          5                3     9
15          5                3     9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM