简体   繁体   English

在R中的变量上选择具有特定值的每个组的最后两行

[英]Select last two rows of each group with certain value on a variable in R

If possible, I would like to select the last two rows of each group (ID) that have a valid value (ie, not NA) on my outcome variable (outcome). 如果可能的话,我想选择每个组(ID)的最后两行 ,这些的结果变量(结果)上具有有效值(即,不是NA)。

Sample data looks like this: 示例数据如下所示:

df <- read.table(text="
                      ID       outcome
                 1    800033   3
                 2    800033   3
                 3    800033   NA   
                 4    800033   2  
                 5    800033   1  
                 15   800076   2
                 16   800076   NA
                 17   800100   4     
                 18   800100   4  
                 19   800100   4  
                 20   800100   3   
                 30   800125   2   
                 31   800125   1   
                 32   800125   NA", header=TRUE)

In the case that a participant does not have two valid values on my outcome variable (eg, ID == 800076), I would still like to keep the last two rows of this group (ID). 如果参与者在我的结果变量上没有两个有效值(例如,ID == 800076),我仍然希望保留该组(ID)的最后两行。 All other rows should be deleted. 所有其他行应删除。

My final data set would therefore look like this: 因此,我的最终数据集将如下所示:

     ID       outcome
4    800033   2  
5    800033   1  
15   800076   2
16   800076   NA
19   800100   4  
20   800100   3   
30   800125   2   
31   800125   1

Any advices on how to do this are highly appreciated! 任何有关如何执行此操作的建议都将受到赞赏!

We can have an if condition for slice and check if number of rows is greater than 2 and select the rows based on that condition. 我们可以为slice if条件,并检查行数是否大于2,然后根据该条件选择行。

library(dplyr)
df %>%
  group_by(ID) %>%
  slice(if (n() > 2) tail(which(!is.na(outcome)), 2) else 1:n())

#      ID outcome
#   <int>   <int>
#1 800033       2
#2 800033       1
#3 800076       2
#4 800076      NA
#5 800100       4
#6 800100       3
#7 800125       2
#8 800125       1

We can do this with dplyr 我们可以用dplyr做到这dplyr

library(dplyr)
df %>% 
   group_by(ID) %>% 
   filter(n() <=2 | !is.na(outcome) ) %>%
   slice(tail(row_number(), 2))
# A tibble: 8 x 2
# Groups:   ID [4]
#      ID outcome
#   <int>   <int>
#1 800033       2
#2 800033       1
#3 800076       2
#4 800076      NA
#5 800100       4
#6 800100       3
#7 800125       2
#8 800125       1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM