简体   繁体   English

如何在r中将因子水平转换为整数

[英]how to convert factor levels to integer in r

I have following dataframe in R我在 R 中有以下数据框

  ID      Season      Year       Weekday
  1       Winter      2017       Monday
  2       Winter      2018       Tuesday
  3       Summer      2017       Monday
  4       Summer      2018       Wednsday

I want to convert these factor levels to integer,following is my desired dataframe我想将这些因子级别转换为整数,以下是我想要的数据框

  ID      Season      Year       Weekday
  1       1           1          1
  2       1           2          2
  3       2           1          1
  4       2           2          3

  Winter = 1,Summer =2
  2017 = 1 , 2018 = 2
  Monday = 1,Tuesday = 2,Wednesday = 3

Currently, I am doing ifelse for above 3目前,我正在为以上 3 做ifelse

  otest_xgb$Weekday <- as.integer(ifelse(otest_xgb$Weekday == "Monday",1,
                                   ifelse(otest_xgb$Weekday == "Tuesday",2,
                                          ifelse(otest_xgb$Weekday == "Wednesday",3,
                                                 ifelse(otest_xgb$Weekday == "Thursday",4,5)))))

Is there any way to avoid writing long ifelse ?有什么办法可以避免写长的ifelse吗?

m=dat
> m[]=lapply(dat,function(x)as.integer(factor(x,unique(x))))
> m
  ID Season Year Weekday
1  1      1    1       1
2  2      1    2       2
3  3      2    1       1
4  4      2    2       3

We can use match with unique elements我们可以将matchunique元素一起使用

library(dplyr)
dat %>%
      mutate_all(funs(match(., unique(.))))
#   ID Season Year Weekday
#1  1      1    1       1
#2  2      1    2       2
#3  3      2    1       1
#4  4      2    2       3

Ordered and Nominal factor variables are needed to be taken care of separately .有序和名义因子变量需要分别处理 Directly converting a factor column to integer or numeric will provide values in lexicographical sense.直接将因子列转换为整数或数字将提供字典意义上的值。

Here Weekday is conceptually ordinal , Year is integer , Season is generally nominal .这里Weekday在概念上是有序的Year整数Season通常是名义上的 However, this is again subjective depending on the kind of analysis required.然而,这又是主观的,取决于所需的分析类型。

For eg.例如。 When you directly convert from factor to integer variables.当您直接从因子转换为整数变量时。 In Weekday column, Wednesday will get a higher value than both Saturday and Tuesday :Weekday列中, Wednesday将获得比星期六和星期二更高的值

 dat[] <- lapply(dat, function(x)as.integer(factor(x)))
 dat 

#  ID Season Year Weekday
#1  1      2    1       1
#2  2      2    2       3
#3  3      1    1       2   (Saturday)
#4  4      1    2       4   (Wednesday): assigned value greater than that ofSaturday        

Therefore, you can convert directly from factor to integers for Season and Year columns only.因此,您只能将“ Season和“ Year列的因子直接转换为整数。 It might be noted that for year column, it works fine as the lexicographical sense matches with its ordinal sense.可能会注意到,对于year列,它可以正常工作,因为词典意义与其序数意义相匹配。

dat[c('Season', 'Year')] <- lapply(dat[c('Season', 'Year')], 
                                   function(x) as.integer(factor(x)))

Weekday needs to converted from an ordered factor variable with desired order of levels. Weekday需要从具有所需级别顺序有序因子变量转换而来 It might be harmless if doing general aggregation , but will drastically affect results when implementing statistical models .如果进行一般聚合可能无害,但在实施统计模型时极大地影响结果

dat$Weekday <- as.integer(factor(dat$Weekday, 
                          levels = c("Monday", "Tuesday", "Wednesday", "Thursday", 
                                     "Friday", "Saturday", "Sunday"), ordered = TRUE))

dat
#  ID Season Year Weekday
#1  1      2    1       1
#2  2      2    2       2
#3  3      1    1       6  (Saturday)
#4  4      1    2       3  (Wednesday): assigned value less than that of Saturday

Data Used:使用的数据:

dat <- read.table(text="  ID      Season      Year       Weekday
1       Winter      2017       Monday
2       Winter      2018       Tuesday
3       Summer      2017       Saturday
4       Summer      2018       Wednesday", header = TRUE)

You can simply use as.numeric() to convert a factor to a numeric.您可以简单地使用as.numeric()将因子转换为数字。 Each value will be changed to the corresponding integer that that factor level represents:每个值将更改为该因子级别表示的相应整数:

library(dplyr)

### Change factor levels to the levels you specified
otest_xgb$Season  <- factor(otest_xgb$Season , levels = c("Winter", "Summer"))
otest_xgb$Year    <- factor(otest_xgb$Year   , levels = c(2017, 2018))
otest_xgb$Weekday <- factor(otest_xgb$Weekday, levels = c("Monday", "Tuesday", "Wednesday"))

otest_xgb %>% 
  dplyr::mutate_at(c("Season", "Year", "Weekday"), as.numeric)


# ID Season Year Weekday
# 1  1      1    1       1
# 2  2      1    2       2
# 3  3      2    1       1
# 4  4      2    2      NA

Once you have converted the season, year and weekday to factors, use this code to change to dummy indicator variables将季节、年份和工作日转换为因子后,请使用此代码更改为虚拟指标变量

contrasts(factor(dat$season) 
contrasts(factor(dat$year)
contrasts(factor(dat$weekday)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM