[英]how to convert factor levels to integer in r
I have following dataframe in R我在 R 中有以下数据框
ID Season Year Weekday
1 Winter 2017 Monday
2 Winter 2018 Tuesday
3 Summer 2017 Monday
4 Summer 2018 Wednsday
I want to convert these factor levels to integer,following is my desired dataframe我想将这些因子级别转换为整数,以下是我想要的数据框
ID Season Year Weekday
1 1 1 1
2 1 2 2
3 2 1 1
4 2 2 3
Winter = 1,Summer =2
2017 = 1 , 2018 = 2
Monday = 1,Tuesday = 2,Wednesday = 3
Currently, I am doing ifelse
for above 3目前,我正在为以上 3 做
ifelse
otest_xgb$Weekday <- as.integer(ifelse(otest_xgb$Weekday == "Monday",1,
ifelse(otest_xgb$Weekday == "Tuesday",2,
ifelse(otest_xgb$Weekday == "Wednesday",3,
ifelse(otest_xgb$Weekday == "Thursday",4,5)))))
Is there any way to avoid writing long ifelse
?有什么办法可以避免写长的
ifelse
吗?
m=dat
> m[]=lapply(dat,function(x)as.integer(factor(x,unique(x))))
> m
ID Season Year Weekday
1 1 1 1 1
2 2 1 2 2
3 3 2 1 1
4 4 2 2 3
We can use match
with unique
elements我们可以将
match
与unique
元素一起使用
library(dplyr)
dat %>%
mutate_all(funs(match(., unique(.))))
# ID Season Year Weekday
#1 1 1 1 1
#2 2 1 2 2
#3 3 2 1 1
#4 4 2 2 3
Ordered and Nominal factor variables are needed to be taken care of separately .有序和名义因子变量需要分别处理。 Directly converting a factor column to integer or numeric will provide values in lexicographical sense.
直接将因子列转换为整数或数字将提供字典意义上的值。
Here Weekday
is conceptually ordinal , Year
is integer , Season
is generally nominal .这里
Weekday
在概念上是有序的, Year
是整数, Season
通常是名义上的。 However, this is again subjective depending on the kind of analysis required.然而,这又是主观的,取决于所需的分析类型。
For eg.例如。 When you directly convert from factor to integer variables.
当您直接从因子转换为整数变量时。 In
Weekday
column, Wednesday
will get a higher value than both Saturday and Tuesday :在
Weekday
列中, Wednesday
将获得比星期六和星期二更高的值:
dat[] <- lapply(dat, function(x)as.integer(factor(x)))
dat
# ID Season Year Weekday
#1 1 2 1 1
#2 2 2 2 3
#3 3 1 1 2 (Saturday)
#4 4 1 2 4 (Wednesday): assigned value greater than that ofSaturday
Therefore, you can convert directly from factor to integers for Season
and Year
columns only.因此,您只能将“
Season
和“ Year
列的因子直接转换为整数。 It might be noted that for year
column, it works fine as the lexicographical sense matches with its ordinal sense.可能会注意到,对于
year
列,它可以正常工作,因为词典意义与其序数意义相匹配。
dat[c('Season', 'Year')] <- lapply(dat[c('Season', 'Year')],
function(x) as.integer(factor(x)))
Weekday
needs to converted from an ordered factor variable with desired order of levels. Weekday
需要从具有所需级别顺序的有序因子变量转换而来。 It might be harmless if doing general aggregation , but will drastically affect results when implementing statistical models .如果进行一般聚合可能无害,但在实施统计模型时会极大地影响结果。
dat$Weekday <- as.integer(factor(dat$Weekday,
levels = c("Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday"), ordered = TRUE))
dat
# ID Season Year Weekday
#1 1 2 1 1
#2 2 2 2 2
#3 3 1 1 6 (Saturday)
#4 4 1 2 3 (Wednesday): assigned value less than that of Saturday
Data Used:使用的数据:
dat <- read.table(text=" ID Season Year Weekday
1 Winter 2017 Monday
2 Winter 2018 Tuesday
3 Summer 2017 Saturday
4 Summer 2018 Wednesday", header = TRUE)
You can simply use as.numeric()
to convert a factor to a numeric.您可以简单地使用
as.numeric()
将因子转换为数字。 Each value will be changed to the corresponding integer that that factor level represents:每个值将更改为该因子级别表示的相应整数:
library(dplyr)
### Change factor levels to the levels you specified
otest_xgb$Season <- factor(otest_xgb$Season , levels = c("Winter", "Summer"))
otest_xgb$Year <- factor(otest_xgb$Year , levels = c(2017, 2018))
otest_xgb$Weekday <- factor(otest_xgb$Weekday, levels = c("Monday", "Tuesday", "Wednesday"))
otest_xgb %>%
dplyr::mutate_at(c("Season", "Year", "Weekday"), as.numeric)
# ID Season Year Weekday
# 1 1 1 1 1
# 2 2 1 2 2
# 3 3 2 1 1
# 4 4 2 2 NA
Once you have converted the season, year and weekday to factors, use this code to change to dummy indicator variables将季节、年份和工作日转换为因子后,请使用此代码更改为虚拟指标变量
contrasts(factor(dat$season)
contrasts(factor(dat$year)
contrasts(factor(dat$weekday)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.