简体   繁体   English

如何从重复观察的最后一行制作子集数据框?

[英]How to make a subset dataframe from the last row of repeated observations?

Simple question. 简单的问题。 I have a dataframe where the subjects have different observations for a time variable and a status variable (death/alive). 我有一个数据框,其中受试者对时间变量和状态变量(死亡/存活)有不同的观察。 I want to make a subset just from the last observation of each subject, but since the number of observations each subject has is variable, and there are 1143 observations from 690 subjects, to manually pick them out would be a headache. 我想从每个主题的最后观察结果中得出一个子集,但是由于每个主题具有的观察结果数量是可变的,并且从690个主题中有1143个观察结果,因此手动选择它们将是一件令人头疼的事情。 Aggregation wouldn´t do the trick because the last observation of each subject is already an aggregated ´time value´ from the previous. 汇总不会解决问题,因为对每个主题的最后观察已经是前一个的汇总“时间值”。

       name visit.date status

30   20        337      1
31   20        421      1
32   20        502      0  <- Row to subset
33   21        427      0  <- Row to subset
34   22         NA     NA  <- Row to subset
35   23        800      1
36   23        882      0  <- Row to subset
37   24        157      1
38   24        185      1
39   24        214      1
40   24        298      1
41   24        381      1  <- Row to subset
42   25        386      1  <- Row to subset
43   26         NA     NA  <- Row to subset
44   27        522      1
45   27        643      1
46   27        711      1  <- Row to subset
47   28        280      0  <- Row to subset
48   29        227      1
49   29        322      1
50   29        335      0  <- Row to subset

As you can see, there are some subjects that have only one observation and I´ll be keeping those, but the subjects that have 2,3 or more observations. 如您所见,有些主题只有一个观测值,而我将保留这些观测值,但是有2.3个或更多观测值的主题。 How can I subset those and make a dataframe with just 1 observation per subject (a total of 620 rows). 我该如何对它们进行子集化,并制作一个每个主题只有1个观察值的数据框(总共620行)。 This is for a survival analysis, which I can do with the dataframe just as it is, but I cannot do a coxph on this dataframe because the independent variable I want to contrast is only 620 in length (1 per subject). 这是为了进行生存分析,我可以按原样使用该数据框,但是我无法对此数据框执行coxph,因为我要对比的自变量长度仅为620(每个主题1个)。

Thank you in advance! 先感谢您!

Here's a solution using dplyr : 这是使用dplyr的解决方案:

library(dplyr)
df %>%  group_by(name) %>% filter(row_number()==n()) 
df[c(df$name[-nrow(df)]!=df$name[-1L],T),];
##    name visit.date status
## 32   20        502      0
## 33   21        427      0
## 34   22         NA     NA
## 36   23        882      0
## 41   24        381      1
## 42   25        386      1
## 43   26         NA     NA
## 46   27        711      1
## 47   28        280      0
## 50   29        335      0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM