[英]Independent Variable and Choice Category in R
I have a data set like the following.我有一个如下的数据集。 The first column is the id.
第一列是 id。 The second column has age groups.
第二列有年龄组。
dat1 <- read.table(header=TRUE, text="
ID Age
8645 15-24
6228 35-44
5830 15-24
1844 25-34
")
ID Age
1 8645 15-24
2 6228 35-44
3 5830 15-24
4 1844 25-34
I want to convert the categories into binary variables based on the variable category.我想根据变量类别将类别转换为二进制变量。 There are several options.
有几种选择。
correlationfunnel
is easy to use here. correlationfunnel
漏斗在这里很容易使用。
library(correlationfunnel)
library(dplyr)
dat1 %>%
select(-ID) %>%
binarize()
`Age__15-24` `Age__25-34` `Age__35-44`
<dbl> <dbl> <dbl>
1 1 0 0
2 0 0 1
3 1 0 0
4 0 1 0
However, for the choice modeling framework, I need to generate a matrix like the following.但是,对于选择建模框架,我需要生成一个如下所示的矩阵。 In which the rows are repeated based on the categories in column 2 of
dat1
.其中根据
dat1
的第 2 列中的类别重复行。 There is a need for a column with binary outcomes in each row (column AgeInd
).需要在每行(列
AgeInd
)中具有二元结果的列。
ID AgeInd Age_15_24 Age_25_34 Age_35_44
1 8645 1 1 0 0
2 8645 0 0 1 0
3 8645 0 0 0 1
4 6228 0 1 0 0
5 6228 0 0 1 0
6 6228 1 0 0 1
7 5830 1 1 0 0
8 5830 0 0 1 0
9 5830 0 0 0 1
10 1844 0 1 0 0
11 1844 1 0 1 0
12 1844 0 0 0 1
Here's a way using dplyr
and tidyr
:这是使用
dplyr
和tidyr
的方法:
library(dplyr)
library(tidyr)
dat1 %>%
mutate(AgeInd = 1) %>%
complete(ID, Age, fill = list(AgeInd = 0)) %>%
mutate(col = row_number(), n = 1) %>%
pivot_wider(names_from = Age, values_from = n,
names_prefix = 'Age_', values_fill = list(n = 0)) %>%
select(-col)
# A tibble: 12 x 5
# ID AgeInd `Age_15-24` `Age_25-34` `Age_35-44`
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 1844 0 1 0 0
# 2 1844 1 0 1 0
# 3 1844 0 0 0 1
# 4 5830 1 1 0 0
# 5 5830 0 0 1 0
# 6 5830 0 0 0 1
# 7 6228 0 1 0 0
# 8 6228 0 0 1 0
# 9 6228 1 0 0 1
#10 8645 1 1 0 0
#11 8645 0 0 1 0
#12 8645 0 0 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.