繁体   English   中英

如何在r中将因子水平转换为整数

[英]how to convert factor levels to integer in r

我在 R 中有以下数据框

  ID      Season      Year       Weekday
  1       Winter      2017       Monday
  2       Winter      2018       Tuesday
  3       Summer      2017       Monday
  4       Summer      2018       Wednsday

我想将这些因子级别转换为整数,以下是我想要的数据框

  ID      Season      Year       Weekday
  1       1           1          1
  2       1           2          2
  3       2           1          1
  4       2           2          3

  Winter = 1,Summer =2
  2017 = 1 , 2018 = 2
  Monday = 1,Tuesday = 2,Wednesday = 3

目前,我正在为以上 3 做ifelse

  otest_xgb$Weekday <- as.integer(ifelse(otest_xgb$Weekday == "Monday",1,
                                   ifelse(otest_xgb$Weekday == "Tuesday",2,
                                          ifelse(otest_xgb$Weekday == "Wednesday",3,
                                                 ifelse(otest_xgb$Weekday == "Thursday",4,5)))))

有什么办法可以避免写长的ifelse吗?

m=dat
> m[]=lapply(dat,function(x)as.integer(factor(x,unique(x))))
> m
  ID Season Year Weekday
1  1      1    1       1
2  2      1    2       2
3  3      2    1       1
4  4      2    2       3

我们可以将matchunique元素一起使用

library(dplyr)
dat %>%
      mutate_all(funs(match(., unique(.))))
#   ID Season Year Weekday
#1  1      1    1       1
#2  2      1    2       2
#3  3      2    1       1
#4  4      2    2       3

有序和名义因子变量需要分别处理 直接将因子列转换为整数或数字将提供字典意义上的值。

这里Weekday在概念上是有序的Year整数Season通常是名义上的 然而,这又是主观的,取决于所需的分析类型。

例如。 当您直接从因子转换为整数变量时。 Weekday列中, Wednesday将获得比星期六和星期二更高的值

 dat[] <- lapply(dat, function(x)as.integer(factor(x)))
 dat 

#  ID Season Year Weekday
#1  1      2    1       1
#2  2      2    2       3
#3  3      1    1       2   (Saturday)
#4  4      1    2       4   (Wednesday): assigned value greater than that ofSaturday        

因此,您只能将“ Season和“ Year列的因子直接转换为整数。 可能会注意到,对于year列,它可以正常工作,因为词典意义与其序数意义相匹配。

dat[c('Season', 'Year')] <- lapply(dat[c('Season', 'Year')], 
                                   function(x) as.integer(factor(x)))

Weekday需要从具有所需级别顺序有序因子变量转换而来 如果进行一般聚合可能无害,但在实施统计模型时极大地影响结果

dat$Weekday <- as.integer(factor(dat$Weekday, 
                          levels = c("Monday", "Tuesday", "Wednesday", "Thursday", 
                                     "Friday", "Saturday", "Sunday"), ordered = TRUE))

dat
#  ID Season Year Weekday
#1  1      2    1       1
#2  2      2    2       2
#3  3      1    1       6  (Saturday)
#4  4      1    2       3  (Wednesday): assigned value less than that of Saturday

使用的数据:

dat <- read.table(text="  ID      Season      Year       Weekday
1       Winter      2017       Monday
2       Winter      2018       Tuesday
3       Summer      2017       Saturday
4       Summer      2018       Wednesday", header = TRUE)

您可以简单地使用as.numeric()将因子转换为数字。 每个值将更改为该因子级别表示的相应整数:

library(dplyr)

### Change factor levels to the levels you specified
otest_xgb$Season  <- factor(otest_xgb$Season , levels = c("Winter", "Summer"))
otest_xgb$Year    <- factor(otest_xgb$Year   , levels = c(2017, 2018))
otest_xgb$Weekday <- factor(otest_xgb$Weekday, levels = c("Monday", "Tuesday", "Wednesday"))

otest_xgb %>% 
  dplyr::mutate_at(c("Season", "Year", "Weekday"), as.numeric)


# ID Season Year Weekday
# 1  1      1    1       1
# 2  2      1    2       2
# 3  3      2    1       1
# 4  4      2    2      NA

将季节、年份和工作日转换为因子后,请使用此代码更改为虚拟指标变量

contrasts(factor(dat$season) 
contrasts(factor(dat$year)
contrasts(factor(dat$weekday)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM