[英]how to convert factor levels to integer in r
我在 R 中有以下數據框
ID Season Year Weekday
1 Winter 2017 Monday
2 Winter 2018 Tuesday
3 Summer 2017 Monday
4 Summer 2018 Wednsday
我想將這些因子級別轉換為整數,以下是我想要的數據框
ID Season Year Weekday
1 1 1 1
2 1 2 2
3 2 1 1
4 2 2 3
Winter = 1,Summer =2
2017 = 1 , 2018 = 2
Monday = 1,Tuesday = 2,Wednesday = 3
目前,我正在為以上 3 做ifelse
otest_xgb$Weekday <- as.integer(ifelse(otest_xgb$Weekday == "Monday",1,
ifelse(otest_xgb$Weekday == "Tuesday",2,
ifelse(otest_xgb$Weekday == "Wednesday",3,
ifelse(otest_xgb$Weekday == "Thursday",4,5)))))
有什么辦法可以避免寫長的ifelse
嗎?
m=dat
> m[]=lapply(dat,function(x)as.integer(factor(x,unique(x))))
> m
ID Season Year Weekday
1 1 1 1 1
2 2 1 2 2
3 3 2 1 1
4 4 2 2 3
我們可以將match
與unique
元素一起使用
library(dplyr)
dat %>%
mutate_all(funs(match(., unique(.))))
# ID Season Year Weekday
#1 1 1 1 1
#2 2 1 2 2
#3 3 2 1 1
#4 4 2 2 3
有序和名義因子變量需要分別處理。 直接將因子列轉換為整數或數字將提供字典意義上的值。
這里Weekday
在概念上是有序的, Year
是整數, Season
通常是名義上的。 然而,這又是主觀的,取決於所需的分析類型。
例如。 當您直接從因子轉換為整數變量時。 在Weekday
列中, Wednesday
將獲得比星期六和星期二更高的值:
dat[] <- lapply(dat, function(x)as.integer(factor(x)))
dat
# ID Season Year Weekday
#1 1 2 1 1
#2 2 2 2 3
#3 3 1 1 2 (Saturday)
#4 4 1 2 4 (Wednesday): assigned value greater than that ofSaturday
因此,您只能將“ Season
和“ Year
列的因子直接轉換為整數。 可能會注意到,對於year
列,它可以正常工作,因為詞典意義與其序數意義相匹配。
dat[c('Season', 'Year')] <- lapply(dat[c('Season', 'Year')],
function(x) as.integer(factor(x)))
Weekday
需要從具有所需級別順序的有序因子變量轉換而來。 如果進行一般聚合可能無害,但在實施統計模型時會極大地影響結果。
dat$Weekday <- as.integer(factor(dat$Weekday,
levels = c("Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday"), ordered = TRUE))
dat
# ID Season Year Weekday
#1 1 2 1 1
#2 2 2 2 2
#3 3 1 1 6 (Saturday)
#4 4 1 2 3 (Wednesday): assigned value less than that of Saturday
使用的數據:
dat <- read.table(text=" ID Season Year Weekday
1 Winter 2017 Monday
2 Winter 2018 Tuesday
3 Summer 2017 Saturday
4 Summer 2018 Wednesday", header = TRUE)
您可以簡單地使用as.numeric()
將因子轉換為數字。 每個值將更改為該因子級別表示的相應整數:
library(dplyr)
### Change factor levels to the levels you specified
otest_xgb$Season <- factor(otest_xgb$Season , levels = c("Winter", "Summer"))
otest_xgb$Year <- factor(otest_xgb$Year , levels = c(2017, 2018))
otest_xgb$Weekday <- factor(otest_xgb$Weekday, levels = c("Monday", "Tuesday", "Wednesday"))
otest_xgb %>%
dplyr::mutate_at(c("Season", "Year", "Weekday"), as.numeric)
# ID Season Year Weekday
# 1 1 1 1 1
# 2 2 1 2 2
# 3 3 2 1 1
# 4 4 2 2 NA
將季節、年份和工作日轉換為因子后,請使用此代碼更改為虛擬指標變量
contrasts(factor(dat$season)
contrasts(factor(dat$year)
contrasts(factor(dat$weekday)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.