简体   繁体   English

Rstudio列的多个二进制功能

[英]Rstudio Columns Multiple Binary Features

I want to split a column in multiple binary dummy columns. 我想将一列拆分为多个二进制虚拟列。 my dataframe: df 我的数据框:df

id siz eage    
1 6 10    
2 7 11    
3 8 10

At the moment i have this code with package qdaptools and caret: 目前,我的qdaptools和caret包包含以下代码:

df <- cbind(df [1:3],mtabulate(strsplit(as.character(df$age), ':')))

My question: how can I give a title to these dummy columns, so I get this: 我的问题:我如何给这些虚拟列命名,所以我得到了:

id size age_10 age_11    
1 6 1 0    
2 7 0 1    
3 8 1 0

To rename by index: colnames(df)[4:5] <- c("age_10", "age_11") 通过索引重命名: colnames(df)[4:5] <- c("age_10", "age_11")
To rename by existing column name colnames(df)[colnames(df) == "INSERT_COL_NAME"] <- "NEW_COL_NAME" 用现有的列名重命名colnames(df)[colnames(df) == "INSERT_COL_NAME"] <- "NEW_COL_NAME"

You can try dummy.data.frame from dummies package. 您可以尝试dummy.data.framedummies包。

library(dummies)
library(dplyr)

df %>%
  dummy.data.frame(names="age", sep="_")

Output is: 输出为:

  id size age_10 age_11
1  1    6      1      0
2  2    7      0      1
3  3    8      1      0

Sample data: 样本数据:

df <- structure(list(id = 1:3, size = 6:8, age = c(10L, 11L, 10L)), .Names = c("id", 
"size", "age"), class = "data.frame", row.names = c(NA, -3L))


Update: For the error which you are getting on your actual data you can use below code 更新:对于实际数据中出现的错误,可以使用以下代码

Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? sort.list(y)中的错误:'x'必须是'sort.list'的原子。您在列表上调用过'sort'吗?

library(dummies)
library(dplyr)

df %>%
  data.frame() %>%
  dummy.data.frame(names="Verkoopkanaal_groepering", sep="_")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM