简体   繁体   中英

How to get the first rows in an R dataframe that meet a specific condition?

I have a dataframe with many thousands of rows. Every row is a hospitalization record; it contains the ID of the patient and a lot of health information (diagnosis, date of admission, date of dismissal, and so on).

Every patient can have more than a hospitalization record, but I need only the first hospitalization of every patient, eg the first record for each patient ID according to the date of admission. How can I get this result in R?

Thank you in advance.

I think I have a solution, but there's probably a smoother way to do this.

Try this using dplyr . Note, I assume that when you say 'first' record you mean oldest record. If you want the most recent record, use max() instead.

install.packages('dplyr')
library(dplyr)

your_data <- group_by(your_data, patientID)
## This gives you a data frame with all dates and IDs for first visits
first_records <- summarise(your_data, min(admit_date))

## Create ID to match 
first_records$matchID <- paste(first_records$patientID, first_records$admit_date)
your_data$matchID <- paste(your_data$patientID, your_data$admit_date)

## Get complete records
first_records <- your_data[your_data$matchID %in% first_records$matchID, ]

Lemme know how this goes.

EDIT: Definitely looks like an easier solution that @alistaire posted:

your_data <- group_by(your_data, patientID)
first_records <- filter(your_data, adm_date == min(admission_date))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM