逐列返回最常见的值，将该列中的 null 替换为该值

Question

I'd like to replace the na values in my df column with the most common value by group我想按组用最常见的值替换 df 列中的 na 值

#Ex:

df <- data.frame(Home_Abbr = c('PHI', 'PHI', 'DAL', 'PHI'),
                 Home_City = c('Philadelphia', 'Philadelphia', 'Dallas', NULL))

#Desired Result

Home_Abbr   Home_City

PHI         Philadelphia
PHI         Philadelphia
DAL         Dallas
PHI         Philadelphia

Here is what I've tried so far:这是我迄今为止尝试过的：

df <- df %>%
  group_by(Home_Abbr) %>%
  mutate(Home_City = names(which.max(table(Home_City))))

But when I run this I get a 'Can't combine NULL and non NULL results' Error.但是当我运行它时，我得到一个“不能结合 NULL 和非 NULL 结果”错误。

Answer 1

We can use Mode function我们可以使用Mode功能

 Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

and then replace然后replace

library(dplyr)
df %>% 
  group_by(Home_Abbr) %>%
  mutate(Home_City = replace(Home_City, is.na(Home_City), 
      Mode(Home_City))) %>%
  ungroup

-output -输出

# A tibble: 4 × 2
  Home_Abbr Home_City   
  <chr>     <chr>       
1 PHI       Philadelphia
2 PHI       Philadelphia
3 DAL       Dallas      
4 PHI       Philadelphia

data数据

df <- structure(list(Home_Abbr = c("PHI", "PHI", "DAL", "PHI"), Home_City = c("Philadelphia", 
"Philadelphia", "Dallas", NA)), class = "data.frame", row.names = c(NA, 
-4L))

逐列返回最常见的值，将该列中的 null 替换为该值

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-05-16 19:55:50

data数据

逐列返回最常见的值，将该列中的 null 替换为该值

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-05-16 19:55:50

data数据

解决方案1
2 已采纳 2022-05-16 19:55:50