简体   繁体   中英

how to convert factor levels to integer in r

I have following dataframe in R

  ID      Season      Year       Weekday
  1       Winter      2017       Monday
  2       Winter      2018       Tuesday
  3       Summer      2017       Monday
  4       Summer      2018       Wednsday

I want to convert these factor levels to integer,following is my desired dataframe

  ID      Season      Year       Weekday
  1       1           1          1
  2       1           2          2
  3       2           1          1
  4       2           2          3

  Winter = 1,Summer =2
  2017 = 1 , 2018 = 2
  Monday = 1,Tuesday = 2,Wednesday = 3

Currently, I am doing ifelse for above 3

  otest_xgb$Weekday <- as.integer(ifelse(otest_xgb$Weekday == "Monday",1,
                                   ifelse(otest_xgb$Weekday == "Tuesday",2,
                                          ifelse(otest_xgb$Weekday == "Wednesday",3,
                                                 ifelse(otest_xgb$Weekday == "Thursday",4,5)))))

Is there any way to avoid writing long ifelse ?

m=dat
> m[]=lapply(dat,function(x)as.integer(factor(x,unique(x))))
> m
  ID Season Year Weekday
1  1      1    1       1
2  2      1    2       2
3  3      2    1       1
4  4      2    2       3

We can use match with unique elements

library(dplyr)
dat %>%
      mutate_all(funs(match(., unique(.))))
#   ID Season Year Weekday
#1  1      1    1       1
#2  2      1    2       2
#3  3      2    1       1
#4  4      2    2       3

Ordered and Nominal factor variables are needed to be taken care of separately . Directly converting a factor column to integer or numeric will provide values in lexicographical sense.

Here Weekday is conceptually ordinal , Year is integer , Season is generally nominal . However, this is again subjective depending on the kind of analysis required.

For eg. When you directly convert from factor to integer variables. In Weekday column, Wednesday will get a higher value than both Saturday and Tuesday :

 dat[] <- lapply(dat, function(x)as.integer(factor(x)))
 dat 

#  ID Season Year Weekday
#1  1      2    1       1
#2  2      2    2       3
#3  3      1    1       2   (Saturday)
#4  4      1    2       4   (Wednesday): assigned value greater than that ofSaturday        

Therefore, you can convert directly from factor to integers for Season and Year columns only. It might be noted that for year column, it works fine as the lexicographical sense matches with its ordinal sense.

dat[c('Season', 'Year')] <- lapply(dat[c('Season', 'Year')], 
                                   function(x) as.integer(factor(x)))

Weekday needs to converted from an ordered factor variable with desired order of levels. It might be harmless if doing general aggregation , but will drastically affect results when implementing statistical models .

dat$Weekday <- as.integer(factor(dat$Weekday, 
                          levels = c("Monday", "Tuesday", "Wednesday", "Thursday", 
                                     "Friday", "Saturday", "Sunday"), ordered = TRUE))

dat
#  ID Season Year Weekday
#1  1      2    1       1
#2  2      2    2       2
#3  3      1    1       6  (Saturday)
#4  4      1    2       3  (Wednesday): assigned value less than that of Saturday

Data Used:

dat <- read.table(text="  ID      Season      Year       Weekday
1       Winter      2017       Monday
2       Winter      2018       Tuesday
3       Summer      2017       Saturday
4       Summer      2018       Wednesday", header = TRUE)

You can simply use as.numeric() to convert a factor to a numeric. Each value will be changed to the corresponding integer that that factor level represents:

library(dplyr)

### Change factor levels to the levels you specified
otest_xgb$Season  <- factor(otest_xgb$Season , levels = c("Winter", "Summer"))
otest_xgb$Year    <- factor(otest_xgb$Year   , levels = c(2017, 2018))
otest_xgb$Weekday <- factor(otest_xgb$Weekday, levels = c("Monday", "Tuesday", "Wednesday"))

otest_xgb %>% 
  dplyr::mutate_at(c("Season", "Year", "Weekday"), as.numeric)


# ID Season Year Weekday
# 1  1      1    1       1
# 2  2      1    2       2
# 3  3      2    1       1
# 4  4      2    2      NA

Once you have converted the season, year and weekday to factors, use this code to change to dummy indicator variables

contrasts(factor(dat$season) 
contrasts(factor(dat$year)
contrasts(factor(dat$weekday)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM