简体   繁体   中英

Recode a numeric variable with many values including NAs

How to recode a numeric variable with many values including missing values, to have numbers 0:n-1 where n is the number of unique values including NA , in a tidy way?

Example:

df <- tibble(x = c(1000, 1000, NA, 1001, 1002, 1003, NA, 1003))

Required output (though it can be any recoding scheme as long as the values are 0:n-1 ):

# A tibble: 8 x 2
      x     y
  <dbl> <dbl>
1  1000     0
2  1000     0
3    NA     4
4  1001     1
5  1002     2
6  1003     3
7    NA     4
8  1003     3

I am able to do this in a tidy way by first converting x to a factor then using fct_recode() with a named list (created automatically because remember there are many values in x ), then back to numeric:

df <- df %>% mutate(x_fct = factor(case_when(
    is.na(x) ~ "level_na",
    TRUE ~ str_c("level_", x)
  )))
x_levels <- levels(df$x_fct)
n_levels <- length(x_levels)
names(x_levels) <- as.character(0:(n_levels - 1))
df <- df %>%
  mutate(y = as.numeric(fct_recode(x_fct, !!!x_levels)) - 1)
df
# A tibble: 8 x 3
      x x_fct          y
  <dbl> <fct>      <dbl>
1  1000 level_1000     0
2  1000 level_1000     0
3    NA level_na       4
4  1001 level_1001     1
5  1002 level_1002     2
6  1003 level_1003     3
7    NA level_na       4
8  1003 level_1003     3

But this seems very cumbersome. Surely there's a simpler way, preferably in a single pipe.

One way would be to use match + unique . You can add sample to add randomness

library(dplyr)

df %>%
  mutate(level = paste('level', x, sep = '_'), 
         y = match(x, sample(unique(x))) - 1)

#      x level          y
#  <dbl> <chr>      <dbl>
#1  1000 level_1000     4
#2  1000 level_1000     4
#3    NA level_NA       2
#4  1001 level_1001     0
#5  1002 level_1002     1
#6  1003 level_1003     3
#7    NA level_NA       2
#8  1003 level_1003     3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM