簡體   English   中英

從水平創建虛擬變量

[英]create dummy variables from levels

這是我想要的:

age colorred colorgreen colorblue
   1        1          0         0
   2        0          1         0
   3        0          0         1

只要 dataframe 包含足夠的行來表示所有級別的因素,我就可以輕松創建數據。 我傾向於使用 package 假人,這很有效:

library(dummies)
df <- data.frame(
    age = c(1,2,3)
    , color = c("red", "green", "blue")
)
df$color <- factor(as.character(df$color), ordered = FALSE, levels = c("red", "green", "blue"))
str(df)
df <- dummy.data.frame(df, names = c("color"))
df

但是,如果 dataframe 不包含足夠的數據,我將無法獲得所需的格式:

library(dummies)

df <- data.frame(
    age = 33
    , color = "red"
)
df$color <- factor(as.character(df$color), ordered = FALSE, levels = c("red", "green", "blue"))
str(df)
df <- dummy.data.frame(df, names = c("color"))
df

是否可以將轉換烘焙到一些 model,即使數據只包含一行,它也會轉換?

你真的不需要任何包來做到這一點。 在基地 R 你可以這樣做:

my_columns <- c("red", "green", "blue")

df <- data.frame(
    age = c(1,2,3), 
    color = c("red", "green", "blue")
)

cbind(age = df$age, `colnames<-`(as.data.frame(t(sapply(df$color, 
      function(x) as.numeric(x == my_columns)))), my_columns))
#>   age red green blue
#> 1   1   1     0    0
#> 2   2   0     1    0
#> 3   3   0     0    1

df <- data.frame(
    age = 33, color = "red"
)

cbind(age = df$age, `colnames<-`(as.data.frame(t(sapply(df$color, 
      function(x) as.numeric(x == my_columns)))), my_columns))
#>   age red green blue
#> 1  33   1     0    0

編輯

通過編寫 function 來處理邏輯,可以實現允許一次處理多個列的更完整的解決方案:

expand_factors <- function(df, columns)
{
  for(column in columns){
    if(is.character(df[[column]])) df[[column]] <- factor(df[[column]])
    my_columns <- levels(df[[column]])
    mat <- t(sapply(df[[column]], function(x) as.numeric(x == my_columns)))
    new_cols <- setNames(as.data.frame(mat), my_columns)
    df <- cbind(df[which(names(df) != column)], new_cols)
  }
  df
}

所以如果我有這個數據框:

df <- data.frame(age = 1:3,
                 shoe_size = 4:6,
                 colors = c("red", "green", "blue"),
                 fruits = c("apples", "bananas", "cherries"),
                 temp   = factor(rep("cold", 3), levels = c("hot", "cold")))

df
#>   age shoe_size colors   fruits temp
#> 1   1         4    red   apples cold
#> 2   2         5  green  bananas cold
#> 3   3         6   blue cherries cold

然后我可以通過這樣做來擴展我喜歡的所有因素:

expand_factors(df, c("colors", "fruits", "temp"))
#>   age shoe_size blue green red apples bananas cherries hot cold
#> 1   1         4    0     0   1      1       0        0   0    1
#> 2   2         5    0     1   0      0       1        0   0    1
#> 3   3         6    1     0   0      0       0        1   0    1

reprex package (v0.3.0) 創建於 2020-08-20

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM