[英]Replace NA values with modal value for factor variables in dplyr
假設我有以下data.frame。 我想用最常見的響應替換 NA, a
df <- read.table(text = "id result
1 a
2 a
3 a
4 b
5 NA", header = T)
我正在尋找這樣的東西:
calculate_mode <- function(x) {
uniqx <- unique(x)
uniqx[which.max(tabulate(match(x, uniqx)))]
}
df = df %>%
mutate(result = ifelse(is.na(result), calculate_mode(result), result))
但我不確定在定義自定義 function 之外是否有更“整潔”的方式來執行此操作。
library(dplyr)
library(tidyr)
# manually get the most frequent values and tidyr::replace_na
most_value <- table(df$result) %>% sort(decreasing = TRUE) %>%
head(1) %>% names()
df %>% replace_na(list(result = most_value))
#> id result
#> 1 1 a
#> 2 2 a
#> 3 3 a
#> 4 4 b
#> 5 5 a
# do it acorss multiple column - still kind of using functions
most <- function(x) {
table(x) %>% sort(decreasing = TRUE) %>% head(1) %>% names()
}
multiple_column <- left_join(df, df, by = "id")
multiple_column
#> id result.x result.y
#> 1 1 a a
#> 2 2 a a
#> 3 3 a a
#> 4 4 b b
#> 5 5 <NA> <NA>
multiple_column %>%
mutate(across(.cols = starts_with("result"), .fns = function(x) {
if_else(is.na(x), most(x), x)
}))
#> id result.x result.y
#> 1 1 a a
#> 2 2 a a
#> 3 3 a a
#> 4 4 b b
#> 5 5 a a
由代表 package (v2.0.0) 於 2021 年 4 月 24 日創建
不短但可能很整潔:
library(dplyr)
df %>%
count(result, sort = TRUE) %>%
slice(1) %>%
rename(mode_value = result) %>%
select(-n) %>%
bind_cols(df, .) %>%
mutate(result = coalesce(result, mode_value))
# id result mode_value
#1 1 a a
#2 2 a a
#3 3 a a
#4 4 b a
#5 5 a a
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.