[英]how to convert factor levels to integer in r
我在 R 中有以下数据框
ID Season Year Weekday
1 Winter 2017 Monday
2 Winter 2018 Tuesday
3 Summer 2017 Monday
4 Summer 2018 Wednsday
我想将这些因子级别转换为整数,以下是我想要的数据框
ID Season Year Weekday
1 1 1 1
2 1 2 2
3 2 1 1
4 2 2 3
Winter = 1,Summer =2
2017 = 1 , 2018 = 2
Monday = 1,Tuesday = 2,Wednesday = 3
目前,我正在为以上 3 做ifelse
otest_xgb$Weekday <- as.integer(ifelse(otest_xgb$Weekday == "Monday",1,
ifelse(otest_xgb$Weekday == "Tuesday",2,
ifelse(otest_xgb$Weekday == "Wednesday",3,
ifelse(otest_xgb$Weekday == "Thursday",4,5)))))
有什么办法可以避免写长的ifelse
吗?
m=dat
> m[]=lapply(dat,function(x)as.integer(factor(x,unique(x))))
> m
ID Season Year Weekday
1 1 1 1 1
2 2 1 2 2
3 3 2 1 1
4 4 2 2 3
我们可以将match
与unique
元素一起使用
library(dplyr)
dat %>%
mutate_all(funs(match(., unique(.))))
# ID Season Year Weekday
#1 1 1 1 1
#2 2 1 2 2
#3 3 2 1 1
#4 4 2 2 3
有序和名义因子变量需要分别处理。 直接将因子列转换为整数或数字将提供字典意义上的值。
这里Weekday
在概念上是有序的, Year
是整数, Season
通常是名义上的。 然而,这又是主观的,取决于所需的分析类型。
例如。 当您直接从因子转换为整数变量时。 在Weekday
列中, Wednesday
将获得比星期六和星期二更高的值:
dat[] <- lapply(dat, function(x)as.integer(factor(x)))
dat
# ID Season Year Weekday
#1 1 2 1 1
#2 2 2 2 3
#3 3 1 1 2 (Saturday)
#4 4 1 2 4 (Wednesday): assigned value greater than that ofSaturday
因此,您只能将“ Season
和“ Year
列的因子直接转换为整数。 可能会注意到,对于year
列,它可以正常工作,因为词典意义与其序数意义相匹配。
dat[c('Season', 'Year')] <- lapply(dat[c('Season', 'Year')],
function(x) as.integer(factor(x)))
Weekday
需要从具有所需级别顺序的有序因子变量转换而来。 如果进行一般聚合可能无害,但在实施统计模型时会极大地影响结果。
dat$Weekday <- as.integer(factor(dat$Weekday,
levels = c("Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday"), ordered = TRUE))
dat
# ID Season Year Weekday
#1 1 2 1 1
#2 2 2 2 2
#3 3 1 1 6 (Saturday)
#4 4 1 2 3 (Wednesday): assigned value less than that of Saturday
使用的数据:
dat <- read.table(text=" ID Season Year Weekday
1 Winter 2017 Monday
2 Winter 2018 Tuesday
3 Summer 2017 Saturday
4 Summer 2018 Wednesday", header = TRUE)
您可以简单地使用as.numeric()
将因子转换为数字。 每个值将更改为该因子级别表示的相应整数:
library(dplyr)
### Change factor levels to the levels you specified
otest_xgb$Season <- factor(otest_xgb$Season , levels = c("Winter", "Summer"))
otest_xgb$Year <- factor(otest_xgb$Year , levels = c(2017, 2018))
otest_xgb$Weekday <- factor(otest_xgb$Weekday, levels = c("Monday", "Tuesday", "Wednesday"))
otest_xgb %>%
dplyr::mutate_at(c("Season", "Year", "Weekday"), as.numeric)
# ID Season Year Weekday
# 1 1 1 1 1
# 2 2 1 2 2
# 3 3 2 1 1
# 4 4 2 2 NA
将季节、年份和工作日转换为因子后,请使用此代码更改为虚拟指标变量
contrasts(factor(dat$season)
contrasts(factor(dat$year)
contrasts(factor(dat$weekday)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.