[英]Return most common value in column by group, replace null in that column with that value
I'd like to replace the na values in my df column with the most common value by group我想按组用最常见的值替换 df 列中的 na 值
#Ex:
df <- data.frame(Home_Abbr = c('PHI', 'PHI', 'DAL', 'PHI'),
Home_City = c('Philadelphia', 'Philadelphia', 'Dallas', NULL))
#Desired Result
Home_Abbr Home_City
PHI Philadelphia
PHI Philadelphia
DAL Dallas
PHI Philadelphia
Here is what I've tried so far:这是我迄今为止尝试过的:
df <- df %>%
group_by(Home_Abbr) %>%
mutate(Home_City = names(which.max(table(Home_City))))
But when I run this I get a 'Can't combine NULL and non NULL results' Error.但是当我运行它时,我得到一个“不能结合 NULL 和非 NULL 结果”错误。
We can use Mode
function我们可以使用
Mode
功能
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
and then replace
然后
replace
library(dplyr)
df %>%
group_by(Home_Abbr) %>%
mutate(Home_City = replace(Home_City, is.na(Home_City),
Mode(Home_City))) %>%
ungroup
-output -输出
# A tibble: 4 × 2
Home_Abbr Home_City
<chr> <chr>
1 PHI Philadelphia
2 PHI Philadelphia
3 DAL Dallas
4 PHI Philadelphia
df <- structure(list(Home_Abbr = c("PHI", "PHI", "DAL", "PHI"), Home_City = c("Philadelphia",
"Philadelphia", "Dallas", NA)), class = "data.frame", row.names = c(NA,
-4L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.