Equivalent of summarise_all for group_by and slice

Question

I'm currently using group_by then slice, to get the maximum dates in my data. There are a few rows where the date is NA , and when using slice(which.max(END_DT)) , the NAs end up getting dropped. Is there an equivalent of summarise_all , so that I can keep the NAs in my data?

ID Date         INitials
1  01-01-2020   AZ
1  02-01-2020   BE
2  NA           CC

I'm using

df %>%
  group_by(ID) %>%
  slice(which.max(Date))

I need the final results to look like below, but it's dropping the NA entirely

ID Date        Initials
1  02-01-2020  BE
2  NA          CC

Answer 1

It's dropping the NA because you're asking it to find the max date...which NA would not fall into. If you want to go the "which.max" route, then I'd just run the dataset again, using filter, and grab the NA(s) and bind them to the dataset.


    df.1 <- df%>%
    filter(is.na(Date))

    df <- rbind(df, df.1)

Answer 2

which.max() is not suitable in this case because (1) it drops missing values and (2) it only finds the first position of maxima. Here is a general solution:

library(dplyr)

df %>%
  mutate(Date = as.Date(Date, "%m-%d-%Y")) %>% 
  group_by(ID) %>%
  filter(Date == max(Date) | all(is.na(Date)))

# # A tibble: 2 x 3
# # Groups:   ID [2]
#      ID Date       INitials
#   <int> <date>     <fct>   
# 1     1 2020-02-01 BE      
# 2     2 NA         CC

df <- structure(list(ID = c(1L, 1L, 2L), Date = structure(c(1L, 2L, 
NA), .Label = c("01-01-2020", "02-01-2020"), class = "factor"), 
INitials = structure(1:3, .Label = c("AZ", "BE", "CC"), class = "factor")),
class = "data.frame", row.names = c(NA, -3L))

Equivalent of summarise_all for group_by and slice

Question

2 answers

solution1
0 2020-06-04 16:14:43

solution2
0 ACCPTED 2020-06-04 17:13:38

Equivalent of summarise_all for group_by and slice

Question

2 answers

solution1 0 2020-06-04 16:14:43

solution2 0 ACCPTED 2020-06-04 17:13:38

solution1
0 2020-06-04 16:14:43

solution2
0 ACCPTED 2020-06-04 17:13:38