I'm currently using group_by then slice, to get the maximum dates in my data. There are a few rows where the date is NA
, and when using slice(which.max(END_DT))
, the NAs end up getting dropped. Is there an equivalent of summarise_all
, so that I can keep the NAs in my data?
ID Date INitials
1 01-01-2020 AZ
1 02-01-2020 BE
2 NA CC
I'm using
df %>%
group_by(ID) %>%
slice(which.max(Date))
I need the final results to look like below, but it's dropping the NA entirely
ID Date Initials
1 02-01-2020 BE
2 NA CC
It's dropping the NA because you're asking it to find the max date...which NA would not fall into. If you want to go the "which.max" route, then I'd just run the dataset again, using filter, and grab the NA(s) and bind them to the dataset.
df.1 <- df%>%
filter(is.na(Date))
df <- rbind(df, df.1)
which.max()
is not suitable in this case because (1) it drops missing values and (2) it only finds the first position of maxima. Here is a general solution:
library(dplyr)
df %>%
mutate(Date = as.Date(Date, "%m-%d-%Y")) %>%
group_by(ID) %>%
filter(Date == max(Date) | all(is.na(Date)))
# # A tibble: 2 x 3
# # Groups: ID [2]
# ID Date INitials
# <int> <date> <fct>
# 1 1 2020-02-01 BE
# 2 2 NA CC
df <- structure(list(ID = c(1L, 1L, 2L), Date = structure(c(1L, 2L,
NA), .Label = c("01-01-2020", "02-01-2020"), class = "factor"),
INitials = structure(1:3, .Label = c("AZ", "BE", "CC"), class = "factor")),
class = "data.frame", row.names = c(NA, -3L))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.