[英]create dummy variables from levels
這是我想要的:
age colorred colorgreen colorblue
1 1 0 0
2 0 1 0
3 0 0 1
只要 dataframe 包含足夠的行來表示所有級別的因素,我就可以輕松創建數據。 我傾向於使用 package 假人,這很有效:
library(dummies)
df <- data.frame(
age = c(1,2,3)
, color = c("red", "green", "blue")
)
df$color <- factor(as.character(df$color), ordered = FALSE, levels = c("red", "green", "blue"))
str(df)
df <- dummy.data.frame(df, names = c("color"))
df
但是,如果 dataframe 不包含足夠的數據,我將無法獲得所需的格式:
library(dummies)
df <- data.frame(
age = 33
, color = "red"
)
df$color <- factor(as.character(df$color), ordered = FALSE, levels = c("red", "green", "blue"))
str(df)
df <- dummy.data.frame(df, names = c("color"))
df
是否可以將轉換烘焙到一些 model,即使數據只包含一行,它也會轉換?
你真的不需要任何包來做到這一點。 在基地 R 你可以這樣做:
my_columns <- c("red", "green", "blue")
df <- data.frame(
age = c(1,2,3),
color = c("red", "green", "blue")
)
cbind(age = df$age, `colnames<-`(as.data.frame(t(sapply(df$color,
function(x) as.numeric(x == my_columns)))), my_columns))
#> age red green blue
#> 1 1 1 0 0
#> 2 2 0 1 0
#> 3 3 0 0 1
df <- data.frame(
age = 33, color = "red"
)
cbind(age = df$age, `colnames<-`(as.data.frame(t(sapply(df$color,
function(x) as.numeric(x == my_columns)))), my_columns))
#> age red green blue
#> 1 33 1 0 0
編輯
通過編寫 function 來處理邏輯,可以實現允許一次處理多個列的更完整的解決方案:
expand_factors <- function(df, columns)
{
for(column in columns){
if(is.character(df[[column]])) df[[column]] <- factor(df[[column]])
my_columns <- levels(df[[column]])
mat <- t(sapply(df[[column]], function(x) as.numeric(x == my_columns)))
new_cols <- setNames(as.data.frame(mat), my_columns)
df <- cbind(df[which(names(df) != column)], new_cols)
}
df
}
所以如果我有這個數據框:
df <- data.frame(age = 1:3,
shoe_size = 4:6,
colors = c("red", "green", "blue"),
fruits = c("apples", "bananas", "cherries"),
temp = factor(rep("cold", 3), levels = c("hot", "cold")))
df
#> age shoe_size colors fruits temp
#> 1 1 4 red apples cold
#> 2 2 5 green bananas cold
#> 3 3 6 blue cherries cold
然后我可以通過這樣做來擴展我喜歡的所有因素:
expand_factors(df, c("colors", "fruits", "temp"))
#> age shoe_size blue green red apples bananas cherries hot cold
#> 1 1 4 0 0 1 1 0 0 0 1
#> 2 2 5 0 1 0 0 1 0 0 1
#> 3 3 6 1 0 0 0 0 1 0 1
由reprex package (v0.3.0) 創建於 2020-08-20
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.