简体   繁体   English

逐列返回最常见的值,将该列中的 null 替换为该值

[英]Return most common value in column by group, replace null in that column with that value

I'd like to replace the na values in my df column with the most common value by group我想按组用最常见的值替换 df 列中的 na 值

#Ex:

df <- data.frame(Home_Abbr = c('PHI', 'PHI', 'DAL', 'PHI'),
                 Home_City = c('Philadelphia', 'Philadelphia', 'Dallas', NULL))

#Desired Result

Home_Abbr   Home_City

PHI         Philadelphia
PHI         Philadelphia
DAL         Dallas
PHI         Philadelphia

Here is what I've tried so far:这是我迄今为止尝试过的:

df <- df %>%
  group_by(Home_Abbr) %>%
  mutate(Home_City = names(which.max(table(Home_City))))

But when I run this I get a 'Can't combine NULL and non NULL results' Error.但是当我运行它时,我得到一个“不能结合 NULL 和非 NULL 结果”错误。

We can use Mode function我们可以使用Mode功能

 Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

and then replace然后replace

library(dplyr)
df %>% 
  group_by(Home_Abbr) %>%
  mutate(Home_City = replace(Home_City, is.na(Home_City), 
      Mode(Home_City))) %>%
  ungroup

-output -输出

# A tibble: 4 × 2
  Home_Abbr Home_City   
  <chr>     <chr>       
1 PHI       Philadelphia
2 PHI       Philadelphia
3 DAL       Dallas      
4 PHI       Philadelphia

data数据

df <- structure(list(Home_Abbr = c("PHI", "PHI", "DAL", "PHI"), Home_City = c("Philadelphia", 
"Philadelphia", "Dallas", NA)), class = "data.frame", row.names = c(NA, 
-4L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM