簡體   English   中英

如何在r中將因子水平轉換為整數

[英]how to convert factor levels to integer in r

我在 R 中有以下數據框

  ID      Season      Year       Weekday
  1       Winter      2017       Monday
  2       Winter      2018       Tuesday
  3       Summer      2017       Monday
  4       Summer      2018       Wednsday

我想將這些因子級別轉換為整數,以下是我想要的數據框

  ID      Season      Year       Weekday
  1       1           1          1
  2       1           2          2
  3       2           1          1
  4       2           2          3

  Winter = 1,Summer =2
  2017 = 1 , 2018 = 2
  Monday = 1,Tuesday = 2,Wednesday = 3

目前,我正在為以上 3 做ifelse

  otest_xgb$Weekday <- as.integer(ifelse(otest_xgb$Weekday == "Monday",1,
                                   ifelse(otest_xgb$Weekday == "Tuesday",2,
                                          ifelse(otest_xgb$Weekday == "Wednesday",3,
                                                 ifelse(otest_xgb$Weekday == "Thursday",4,5)))))

有什么辦法可以避免寫長的ifelse嗎?

m=dat
> m[]=lapply(dat,function(x)as.integer(factor(x,unique(x))))
> m
  ID Season Year Weekday
1  1      1    1       1
2  2      1    2       2
3  3      2    1       1
4  4      2    2       3

我們可以將matchunique元素一起使用

library(dplyr)
dat %>%
      mutate_all(funs(match(., unique(.))))
#   ID Season Year Weekday
#1  1      1    1       1
#2  2      1    2       2
#3  3      2    1       1
#4  4      2    2       3

有序和名義因子變量需要分別處理 直接將因子列轉換為整數或數字將提供字典意義上的值。

這里Weekday在概念上是有序的Year整數Season通常是名義上的 然而,這又是主觀的,取決於所需的分析類型。

例如。 當您直接從因子轉換為整數變量時。 Weekday列中, Wednesday將獲得比星期六和星期二更高的值

 dat[] <- lapply(dat, function(x)as.integer(factor(x)))
 dat 

#  ID Season Year Weekday
#1  1      2    1       1
#2  2      2    2       3
#3  3      1    1       2   (Saturday)
#4  4      1    2       4   (Wednesday): assigned value greater than that ofSaturday        

因此,您只能將“ Season和“ Year列的因子直接轉換為整數。 可能會注意到,對於year列,它可以正常工作,因為詞典意義與其序數意義相匹配。

dat[c('Season', 'Year')] <- lapply(dat[c('Season', 'Year')], 
                                   function(x) as.integer(factor(x)))

Weekday需要從具有所需級別順序有序因子變量轉換而來 如果進行一般聚合可能無害,但在實施統計模型時極大地影響結果

dat$Weekday <- as.integer(factor(dat$Weekday, 
                          levels = c("Monday", "Tuesday", "Wednesday", "Thursday", 
                                     "Friday", "Saturday", "Sunday"), ordered = TRUE))

dat
#  ID Season Year Weekday
#1  1      2    1       1
#2  2      2    2       2
#3  3      1    1       6  (Saturday)
#4  4      1    2       3  (Wednesday): assigned value less than that of Saturday

使用的數據:

dat <- read.table(text="  ID      Season      Year       Weekday
1       Winter      2017       Monday
2       Winter      2018       Tuesday
3       Summer      2017       Saturday
4       Summer      2018       Wednesday", header = TRUE)

您可以簡單地使用as.numeric()將因子轉換為數字。 每個值將更改為該因子級別表示的相應整數:

library(dplyr)

### Change factor levels to the levels you specified
otest_xgb$Season  <- factor(otest_xgb$Season , levels = c("Winter", "Summer"))
otest_xgb$Year    <- factor(otest_xgb$Year   , levels = c(2017, 2018))
otest_xgb$Weekday <- factor(otest_xgb$Weekday, levels = c("Monday", "Tuesday", "Wednesday"))

otest_xgb %>% 
  dplyr::mutate_at(c("Season", "Year", "Weekday"), as.numeric)


# ID Season Year Weekday
# 1  1      1    1       1
# 2  2      1    2       2
# 3  3      2    1       1
# 4  4      2    2      NA

將季節、年份和工作日轉換為因子后,請使用此代碼更改為虛擬指標變量

contrasts(factor(dat$season) 
contrasts(factor(dat$year)
contrasts(factor(dat$weekday)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM