简体   繁体   中英

Equivalent of summarise_all for group_by and slice

I'm currently using group_by then slice, to get the maximum dates in my data. There are a few rows where the date is NA , and when using slice(which.max(END_DT)) , the NAs end up getting dropped. Is there an equivalent of summarise_all , so that I can keep the NAs in my data?

ID Date         INitials
1  01-01-2020   AZ
1  02-01-2020   BE
2  NA           CC

I'm using

df %>%
  group_by(ID) %>%
  slice(which.max(Date))

I need the final results to look like below, but it's dropping the NA entirely

ID Date        Initials
1  02-01-2020  BE
2  NA          CC

It's dropping the NA because you're asking it to find the max date...which NA would not fall into. If you want to go the "which.max" route, then I'd just run the dataset again, using filter, and grab the NA(s) and bind them to the dataset.


    df.1 <- df%>%
    filter(is.na(Date))

    df <- rbind(df, df.1)

which.max() is not suitable in this case because (1) it drops missing values and (2) it only finds the first position of maxima. Here is a general solution:

library(dplyr)

df %>%
  mutate(Date = as.Date(Date, "%m-%d-%Y")) %>% 
  group_by(ID) %>%
  filter(Date == max(Date) | all(is.na(Date)))

# # A tibble: 2 x 3
# # Groups:   ID [2]
#      ID Date       INitials
#   <int> <date>     <fct>   
# 1     1 2020-02-01 BE      
# 2     2 NA         CC   

df <- structure(list(ID = c(1L, 1L, 2L), Date = structure(c(1L, 2L, 
NA), .Label = c("01-01-2020", "02-01-2020"), class = "factor"), 
INitials = structure(1:3, .Label = c("AZ", "BE", "CC"), class = "factor")),
class = "data.frame", row.names = c(NA, -3L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM