[英]Select last two rows of each group with certain value on a variable in R
If possible, I would like to select the last two rows of each group (ID) that have a valid value (ie, not NA) on my outcome variable (outcome). 如果可能的话,我想选择每个组(ID)的最后两行 ,这些行的结果变量(结果)上具有有效值(即,不是NA)。
Sample data looks like this: 示例数据如下所示:
df <- read.table(text="
ID outcome
1 800033 3
2 800033 3
3 800033 NA
4 800033 2
5 800033 1
15 800076 2
16 800076 NA
17 800100 4
18 800100 4
19 800100 4
20 800100 3
30 800125 2
31 800125 1
32 800125 NA", header=TRUE)
In the case that a participant does not have two valid values on my outcome variable (eg, ID == 800076), I would still like to keep the last two rows of this group (ID). 如果参与者在我的结果变量上没有两个有效值(例如,ID == 800076),我仍然希望保留该组(ID)的最后两行。 All other rows should be deleted. 所有其他行应删除。
My final data set would therefore look like this: 因此,我的最终数据集将如下所示:
ID outcome
4 800033 2
5 800033 1
15 800076 2
16 800076 NA
19 800100 4
20 800100 3
30 800125 2
31 800125 1
Any advices on how to do this are highly appreciated! 任何有关如何执行此操作的建议都将受到赞赏!
We can have an if
condition for slice
and check if number of rows is greater than 2 and select the rows based on that condition. 我们可以为slice
if
条件,并检查行数是否大于2,然后根据该条件选择行。
library(dplyr)
df %>%
group_by(ID) %>%
slice(if (n() > 2) tail(which(!is.na(outcome)), 2) else 1:n())
# ID outcome
# <int> <int>
#1 800033 2
#2 800033 1
#3 800076 2
#4 800076 NA
#5 800100 4
#6 800100 3
#7 800125 2
#8 800125 1
We can do this with dplyr
我们可以用dplyr
做到这dplyr
library(dplyr)
df %>%
group_by(ID) %>%
filter(n() <=2 | !is.na(outcome) ) %>%
slice(tail(row_number(), 2))
# A tibble: 8 x 2
# Groups: ID [4]
# ID outcome
# <int> <int>
#1 800033 2
#2 800033 1
#3 800076 2
#4 800076 NA
#5 800100 4
#6 800100 3
#7 800125 2
#8 800125 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.