简体   繁体   中英

R: How to combine rows of a data frame with the same id and take the newest non-NA value?

Example data frame

date       name     speed  acceleration
1/1/17     bob      5      NA
1/1/15     george   5      NA
1/1/15     bob      NA     4
1/1/17     bob      4      NA

I want to condense all rows with the same name into one row and keep the newest non-na value for the speed and acceleration column.

Desired output

date       name     speed  acceleration
1/1/17     bob      5      4
1/1/15     george   5      NA

You can do it this way:

library(dplyr)
library(lubridate)

input = read.table(text = 
 "date       name     speed  acceleration
  1/1/17     bob      5      NA
  1/1/15     george   5      NA
  1/1/15     bob      NA     4
  1/1/17     bob      4      NA",
  header = TRUE, stringsAsFactors = FALSE)

output <- input %>%
  mutate(date = mdy(date)) %>% # or maybe dmy, depending on your date format
  group_by(name) %>%
  arrange(desc(date)) %>%
  summarise_all(funs(na.omit(.)[1]))

output
# # A tibble: 2 × 4
#     name       date speed acceleration
#    <chr>     <date> <int>        <int>
# 1    bob 2017-01-01     5            4
# 2 george 2015-01-01     5           NA

Here is an option using data.table . Convert the 'data.frame' to 'data.table' ( setDT(input) ), order the 'date' after converting to Date class, grouped by 'name', loop through the columns and get the first non-NA element

library(data.table)
library(lubridate)
setDT(input)[order(-mdy(date)), lapply(.SD, function(x) x[!is.na(x)][1]), name]
#     name   date speed acceleration
#1:    bob 1/1/17     5            4
#2: george 1/1/15     5           NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM