在R中的变量上选择具有特定值的每个组的最后两行

Question

If possible, I would like to select the last two rows of each group (ID) that have a valid value (ie, not NA) on my outcome variable (outcome). 如果可能的话，我想选择每个组（ID）的最后两行 ，这些行的结果变量（结果）上具有有效值（即，不是NA）。

Sample data looks like this: 示例数据如下所示：

df <- read.table(text="
                      ID       outcome
                 1    800033   3
                 2    800033   3
                 3    800033   NA   
                 4    800033   2  
                 5    800033   1  
                 15   800076   2
                 16   800076   NA
                 17   800100   4     
                 18   800100   4  
                 19   800100   4  
                 20   800100   3   
                 30   800125   2   
                 31   800125   1   
                 32   800125   NA", header=TRUE)

In the case that a participant does not have two valid values on my outcome variable (eg, ID == 800076), I would still like to keep the last two rows of this group (ID). 如果参与者在我的结果变量上没有两个有效值（例如，ID == 800076），我仍然希望保留该组（ID）的最后两行。 All other rows should be deleted. 所有其他行应删除。

My final data set would therefore look like this: 因此，我的最终数据集将如下所示：

     ID       outcome
4    800033   2  
5    800033   1  
15   800076   2
16   800076   NA
19   800100   4  
20   800100   3   
30   800125   2   
31   800125   1

Any advices on how to do this are highly appreciated! 任何有关如何执行此操作的建议都将受到赞赏！

Answer 1

We can have an if condition for slice and check if number of rows is greater than 2 and select the rows based on that condition. 我们可以为slice if条件，并检查行数是否大于2，然后根据该条件选择行。

library(dplyr)
df %>%
  group_by(ID) %>%
  slice(if (n() > 2) tail(which(!is.na(outcome)), 2) else 1:n())

#      ID outcome
#   <int>   <int>
#1 800033       2
#2 800033       1
#3 800076       2
#4 800076      NA
#5 800100       4
#6 800100       3
#7 800125       2
#8 800125       1

Answer 2

We can do this with dplyr 我们可以用dplyr做到这dplyr

library(dplyr)
df %>% 
   group_by(ID) %>% 
   filter(n() <=2 | !is.na(outcome) ) %>%
   slice(tail(row_number(), 2))
# A tibble: 8 x 2
# Groups:   ID [4]
#      ID outcome
#   <int>   <int>
#1 800033       2
#2 800033       1
#3 800076       2
#4 800076      NA
#5 800100       4
#6 800100       3
#7 800125       2
#8 800125       1

在R中的变量上选择具有特定值的每个组的最后两行

问题描述

2 个解决方案

解决方案1
1 2019-05-03 15:20:01

解决方案2
0 2019-05-03 15:41:50

在R中的变量上选择具有特定值的每个组的最后两行

问题描述

2 个解决方案

解决方案1 1 2019-05-03 15:20:01

解决方案2 0 2019-05-03 15:41:50

解决方案1
1 2019-05-03 15:20:01

解决方案2
0 2019-05-03 15:41:50