简体   繁体   中英

Return most common value in column by group, replace null in that column with that value

I'd like to replace the na values in my df column with the most common value by group

#Ex:

df <- data.frame(Home_Abbr = c('PHI', 'PHI', 'DAL', 'PHI'),
                 Home_City = c('Philadelphia', 'Philadelphia', 'Dallas', NULL))

#Desired Result

Home_Abbr   Home_City

PHI         Philadelphia
PHI         Philadelphia
DAL         Dallas
PHI         Philadelphia

Here is what I've tried so far:

df <- df %>%
  group_by(Home_Abbr) %>%
  mutate(Home_City = names(which.max(table(Home_City))))

But when I run this I get a 'Can't combine NULL and non NULL results' Error.

We can use Mode function

 Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

and then replace

library(dplyr)
df %>% 
  group_by(Home_Abbr) %>%
  mutate(Home_City = replace(Home_City, is.na(Home_City), 
      Mode(Home_City))) %>%
  ungroup

-output

# A tibble: 4 × 2
  Home_Abbr Home_City   
  <chr>     <chr>       
1 PHI       Philadelphia
2 PHI       Philadelphia
3 DAL       Dallas      
4 PHI       Philadelphia

data

df <- structure(list(Home_Abbr = c("PHI", "PHI", "DAL", "PHI"), Home_City = c("Philadelphia", 
"Philadelphia", "Dallas", NA)), class = "data.frame", row.names = c(NA, 
-4L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM