简体   繁体   中英

Can I write a function to revalue levels of a factor?

I have a column 'lg_with_children' in my data frame that has 5 levels, 'Half and half', 'Mandarin', 'Shanghainese', 'Other', 'N/A', and 'Not important'. I want to condense the 5 levels down to just 2 levels, 'Shanghainese' and 'Other'.

In order to do this I used the revalue() function from the plyr package to successfully rename the levels. I used the code below and it worked fine.

data$lg_with_children <- revalue(data$lg_with_children,
                             c("Mandarin" = "Other"))
data$lg_with_children <- revalue(data$lg_with_children,
                             c("Half and half" = "Other"))
data$lg_with_children <- revalue(data$lg_with_children,
                             c("N/A" = "Other"))
data$lg_with_children <- revalue(data$lg_with_children,
                             c("Not important" = "Other"))

To condense the code a little I went back data before I revalued the levels and attempted to write a function. I tried the following after doing research on how to write your own functions (I'm rather new at this).

revalue_factor_levels <- function(df, col, source, target) {df$col <- revalue(df$col, c("source" = "target"))}

I intentionally left the df, col, source, and target generic because I need to revalue some other columns in the same way.

Next, I tried to run the code filling in the args and get this message:

warning message

I am not quite sure what the problem is. I tried the following adjustment to code and still nothing.

revalue_factor_levels <- function(df, col, source, target) {df$col <- revalue(df$col, c(source = target))}

Any guidance is appreciated. Thanks.

You can write your function to recode the levels - the easiest way to do that is probably to change the levels directly with levels(fac) <- list(new_lvl1 = c(old_lvl1, old_lvl2), new_lvl2 = c(old_lvl3, old_lvl4))

But there are already several functions that do it out of the box. I typically use the forcats package to manipulate factors.

Check out fct_recode from the forcats package. Link to doc .

There are also other functions that could help you - check out the comments below.

Now, as to why your code isn't working:

  • df$col looks for a column literally named col . The workaround is to do df[[col]] instead.
  • Don't forget to return df at the end of your function
  • c(source = target) will create a vector with one element named "source" , regardless of what happens to be in the variable source . The solution is to create the vector c(source = target) in 2 steps.
revalue_factor_levels <- function(df, col, source, target) {
  to_rename <- target
  names(to_rename) <- source
  df[[col]] <- revalue(df[[col]], to_rename)
  df
}

Returning the df means the syntax is:

data <- revalue_factor_levels(data, "lg_with_children", "Mandarin", "Other")

I like functions that take the data as the first argument and return the modified data because they are pipeable.

library(dplyr)

data <- data %>%
  revalue_factor_levels("lg_with_children", "Mandarin", "Other") %>%
  revalue_factor_levels("lg_with_children", "Half and half", "Other") %>%
  revalue_factor_levels("lg_with_children", "N/A", "Other")

Still, using forcats is easier and less prone to breaking on edge cases.

Edit:

There is nothing preventing you from both using forcats and creating your custom function. For example, this is closer to what you want to achieve:

revalue_factor_levels <- function(df, col, ref_level) {
  df[[col]] <- forcats::fct_others(df[[col]], keep = ref_level)
  df
}

# Will keep Shanghaisese and revalue other levels to "Other". 
data <- revalue_factor_levels(data, "lg_with_children", "Shanghainese")

Here is what I ended up with thanks to help from the community.

revalue_factor_levels <- function(df, col, ref_level) {
  df[[col]] <- fct_other(df[[col]], keep = ref_level)
  df
}

data <- revalue_factor_levels(data, "lg_with_children", "Shanghainese")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM