[英]Convert categories in one column to multiple columns coded as 1 or 0 if present or absent in R
I have data that looks like the following: 我的数据如下所示:
library(dplyr)
library(tidyr)
a <- data_frame(type=c("A", "A", "B", "B", "C", "D"))
print(a)
# A tibble: 6 x 1
type
<chr>
1 A
2 A
3 B
4 B
5 C
6 D
Where type
contains categorical information. 其中
type
包含分类信息。 I am trying to convert each category in type
into its own column coded as 1 if a type
is present and 0 if not; 我想每个类别转换成
type
为自己的列编码为1,如果一个type
是当前和0,如果没有; thus, the final result would look like: 因此,最终结果将如下所示:
b <- data_frame(A=c(1, 1, 0, 0, 0, 0),
B=c(0, 0, 1, 1, 0, 0),
C=c(0, 0, 0, 0, 1, 0),
D=c(0, 0, 0, 0, 0, 1))
# A tibble: 6 x 4
A B C D
<dbl> <dbl> <dbl> <dbl>
1 1. 0. 0. 0.
2 1. 0. 0. 0.
3 0. 1. 0. 0.
4 0. 1. 0. 0.
5 0. 0. 1. 0.
6 0. 0. 0. 1.
I have tried the following: 我尝试了以下方法:
a$dat <- 1
spread(a, type, dat)
However, it does not work as there are multiple instances of some of the categories. 但是,由于某些类别有多个实例,因此它不起作用。 Any help would be appreciated.
任何帮助,将不胜感激。 Thank you!
谢谢!
This is likely a duplicate -- what you are doing is usually referred to as "one hot encoding". 这可能是重复的-您所做的通常称为“一种热编码”。 One way is to leverage
model.matrix
: 一种方法是利用
model.matrix
:
library(tidyverse)
a %>%
model.matrix(~ . - 1, data = .) %>%
as_data_frame()
# A tibble: 6 x 4
typeA typeB typeC typeD
<dbl> <dbl> <dbl> <dbl>
1 1 0 0 0
2 1 0 0 0
3 0 1 0 0
4 0 1 0 0
5 0 0 1 0
6 0 0 0 1
Another option is table
from base R
另一个选择是
base R
table
table(seq_len(nrow(a)), a$type)
# A B C D
# 1 1 0 0 0
# 2 1 0 0 0
# 3 0 1 0 0
# 4 0 1 0 0
# 5 0 0 1 0
# 6 0 0 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.