[英]In R, How do I filter a data frame to only include rows with >=2 non-NA values?
[英]R: How to combine rows of a data frame with the same id and take the newest non-NA value?
示例数据框
date name speed acceleration
1/1/17 bob 5 NA
1/1/15 george 5 NA
1/1/15 bob NA 4
1/1/17 bob 4 NA
我想将具有相同名称的所有行压缩到一行中,并保留速度和加速列的最新非na值。
期望的输出
date name speed acceleration
1/1/17 bob 5 4
1/1/15 george 5 NA
你可以这样做:
library(dplyr)
library(lubridate)
input = read.table(text =
"date name speed acceleration
1/1/17 bob 5 NA
1/1/15 george 5 NA
1/1/15 bob NA 4
1/1/17 bob 4 NA",
header = TRUE, stringsAsFactors = FALSE)
output <- input %>%
mutate(date = mdy(date)) %>% # or maybe dmy, depending on your date format
group_by(name) %>%
arrange(desc(date)) %>%
summarise_all(funs(na.omit(.)[1]))
output
# # A tibble: 2 × 4
# name date speed acceleration
# <chr> <date> <int> <int>
# 1 bob 2017-01-01 5 4
# 2 george 2015-01-01 5 NA
这是一个使用data.table
的选项。 将'data.frame'转换为'data.table'( setDT(input)
),在转换为Date
类后对'date'进行order
,按'name'分组,循环遍历列并获取第一个非NA元素
library(data.table)
library(lubridate)
setDT(input)[order(-mdy(date)), lapply(.SD, function(x) x[!is.na(x)][1]), name]
# name date speed acceleration
#1: bob 1/1/17 5 4
#2: george 1/1/15 5 NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.