简体   繁体   中英

modifying levels in factor variable using ifelse

I wanted to modify levels in my factor variable by grouping two levels into one when I came across this strange situation. Basically, my new level is created, but all the remaining levels seem to be moved to the next one. Here is my example data, the code used and the output.

library(tidyverse) 
data <- structure(list(factor1 = structure(c(1L, 1L, 2L, 3L, 1L, 2L, 
        1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
        1L, 1L, 1L, 3L, 1L, 1L, 1L, 4L), .Label = c("0", "1", "2", "3", 
        "4", "5", "6", "7"), class = "factor")), row.names = c(NA, -30L
        ), class = c("tbl_df", "tbl", "data.frame"), .Names = "factor1")
data_out <- data %>% mutate(factor1 = ifelse(factor1 %in% c('0', '1'), 
                                             factor1, '>1'))
structure(list(factor1 = c("1", "1", "2", ">1", "1", "2", "1", 
"1", "2", "2", "2", "2", "2", "1", "2", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", ">1", "1", "1", "1", ">1")), .Names = "factor1", 
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -30L))

Is it desirable behaviour? It certainly isn't in my case. How could it be explained and then corrected?

I'm guessing this problem revolves around the way factors are constructed. How a factor goes from having levels of {"0", "1"} to levels {"1","2", ">1"} by way of mutate was still not clear to me.

R factors are actually base-1 integer vectors with attributes that are their levels. So your "0" levels initially were actually integer-1's and your "1" levels were integer-2's. Apparently the mutate function saw fit to create a new factor with an additional level that was printed as ">1" but also reassigned the "0" level to a new "1"-level and the "1" level to a "2"-level. This looks like a dangerous behavior on hte part of mutate to me. I think it should have given you either a new factor with levels "0","1",">1" or it should have thrown an error.

The error comes from ifelse although mutate compunds the problem by making the new column into a factor as well. If you coerce data to a dataframe, then you see:

data$factor2 <- ifelse( data$factor1 %in% c('0', '1'), 
                                              data$factor1, '>1')
data
#-------- same issue except
   factor1 factor2
1        0       1
2        0       1
3        1       2
4        2      >1
.... delete the other 26 rows
> str(data)
'data.frame':   30 obs. of  2 variables:
 $ factor1: Factor w/ 8 levels "0","1","2","3",..: 1 1 2 3 1 2 1 1 2 2 ...
 $ factor2: chr  "1" "1" "2" ">1" ...

This would have let you stay in the dplyr package:

recode_factor(data$factor1, `0` = "0", `1` = "1", .default=">1")
 [1] 0  0  1  >1 0  1  0  0  1  1  1  1  1  0  1  0  0  0  0  0  0  0  0  0  0  >1 0  0  0  >1
Levels: 0 1 >1

如果有人在将来遇到类似问题并且正在寻找一种简单的方法来分组这些因素而不重新分配剩余的一个:

fct_collapse(data$factor1, '>1' = c('2', '3')) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM