简体   繁体   中英

Replace NA values with modal value for factor variables in dplyr

Let's say I have the following data.frame. I'd like to replace the NA with the most commonly occurring response, a

df <- read.table(text = "id result
1 a
2 a
3 a
4 b
5 NA", header = T)

I'm looking for something like this:

calculate_mode <- function(x) {
  uniqx <- unique(x)
  uniqx[which.max(tabulate(match(x, uniqx)))]
}

df = df %>% 
  mutate(result = ifelse(is.na(result), calculate_mode(result), result))

But I'm not sure if there is a more "tidy" way of doing this outside defining a custom function.

  library(dplyr)
  library(tidyr)
  
  # manually get the most frequent values and tidyr::replace_na 
  most_value <- table(df$result) %>% sort(decreasing = TRUE) %>%
    head(1) %>% names()
  df %>% replace_na(list(result = most_value))
#>   id result
#> 1  1      a
#> 2  2      a
#> 3  3      a
#> 4  4      b
#> 5  5      a

Dynamically apply on multiple column

  # do it acorss multiple column - still kind of using functions
  most <- function(x) {
    table(x) %>% sort(decreasing = TRUE) %>% head(1) %>% names()
  }
  multiple_column <- left_join(df, df, by = "id")
  multiple_column
#>   id result.x result.y
#> 1  1        a        a
#> 2  2        a        a
#> 3  3        a        a
#> 4  4        b        b
#> 5  5     <NA>     <NA>
  
  multiple_column %>%
    mutate(across(.cols = starts_with("result"), .fns = function(x) {
      if_else(is.na(x), most(x), x)
    }))
#>   id result.x result.y
#> 1  1        a        a
#> 2  2        a        a
#> 3  3        a        a
#> 4  4        b        b
#> 5  5        a        a

Created on 2021-04-24 by the reprex package (v2.0.0)

Not shorter but maybe tidy :

library(dplyr)

df %>%
  count(result, sort = TRUE) %>%
  slice(1) %>%
  rename(mode_value = result) %>%
  select(-n) %>%
  bind_cols(df, .) %>%
  mutate(result = coalesce(result, mode_value))

#  id result mode_value
#1  1      a          a
#2  2      a          a
#3  3      a          a
#4  4      b          a
#5  5      a          a

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM