I have a data set like the following. The first column is the id. The second column has age groups.
dat1 <- read.table(header=TRUE, text="
ID Age
8645 15-24
6228 35-44
5830 15-24
1844 25-34
")
ID Age
1 8645 15-24
2 6228 35-44
3 5830 15-24
4 1844 25-34
I want to convert the categories into binary variables based on the variable category. There are several options. correlationfunnel
is easy to use here.
library(correlationfunnel)
library(dplyr)
dat1 %>%
select(-ID) %>%
binarize()
`Age__15-24` `Age__25-34` `Age__35-44`
<dbl> <dbl> <dbl>
1 1 0 0
2 0 0 1
3 1 0 0
4 0 1 0
However, for the choice modeling framework, I need to generate a matrix like the following. In which the rows are repeated based on the categories in column 2 of dat1
. There is a need for a column with binary outcomes in each row (column AgeInd
).
ID AgeInd Age_15_24 Age_25_34 Age_35_44
1 8645 1 1 0 0
2 8645 0 0 1 0
3 8645 0 0 0 1
4 6228 0 1 0 0
5 6228 0 0 1 0
6 6228 1 0 0 1
7 5830 1 1 0 0
8 5830 0 0 1 0
9 5830 0 0 0 1
10 1844 0 1 0 0
11 1844 1 0 1 0
12 1844 0 0 0 1
Here's a way using dplyr
and tidyr
:
library(dplyr)
library(tidyr)
dat1 %>%
mutate(AgeInd = 1) %>%
complete(ID, Age, fill = list(AgeInd = 0)) %>%
mutate(col = row_number(), n = 1) %>%
pivot_wider(names_from = Age, values_from = n,
names_prefix = 'Age_', values_fill = list(n = 0)) %>%
select(-col)
# A tibble: 12 x 5
# ID AgeInd `Age_15-24` `Age_25-34` `Age_35-44`
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 1844 0 1 0 0
# 2 1844 1 0 1 0
# 3 1844 0 0 0 1
# 4 5830 1 1 0 0
# 5 5830 0 0 1 0
# 6 5830 0 0 0 1
# 7 6228 0 1 0 0
# 8 6228 0 0 1 0
# 9 6228 1 0 0 1
#10 8645 1 1 0 0
#11 8645 0 0 1 0
#12 8645 0 0 0 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.