简体   繁体   中英

Independent Variable and Choice Category in R

I have a data set like the following. The first column is the id. The second column has age groups.

dat1 <- read.table(header=TRUE, text="
ID  Age  
8645    15-24  
6228    35-44  
5830    15-24  
1844    25-34  
")
    ID   Age
1 8645 15-24
2 6228 35-44
3 5830 15-24
4 1844 25-34

I want to convert the categories into binary variables based on the variable category. There are several options. correlationfunnel is easy to use here.

library(correlationfunnel)
library(dplyr)
dat1 %>%
  select(-ID) %>%
  binarize()
  `Age__15-24` `Age__25-34` `Age__35-44`
         <dbl>        <dbl>        <dbl>
1            1            0            0
2            0            0            1
3            1            0            0
4            0            1            0

However, for the choice modeling framework, I need to generate a matrix like the following. In which the rows are repeated based on the categories in column 2 of dat1 . There is a need for a column with binary outcomes in each row (column AgeInd ).

     ID AgeInd Age_15_24 Age_25_34 Age_35_44
1  8645      1         1         0         0
2  8645      0         0         1         0
3  8645      0         0         0         1
4  6228      0         1         0         0
5  6228      0         0         1         0
6  6228      1         0         0         1
7  5830      1         1         0         0
8  5830      0         0         1         0
9  5830      0         0         0         1
10 1844      0         1         0         0
11 1844      1         0         1         0
12 1844      0         0         0         1

Here's a way using dplyr and tidyr :

library(dplyr)
library(tidyr)

dat1 %>%
   mutate(AgeInd = 1) %>%
   complete(ID, Age, fill = list(AgeInd = 0)) %>%
   mutate(col = row_number(), n = 1) %>%
   pivot_wider(names_from = Age, values_from = n, 
               names_prefix = 'Age_', values_fill = list(n = 0)) %>%
   select(-col)


# A tibble: 12 x 5
#     ID AgeInd `Age_15-24` `Age_25-34` `Age_35-44`
#   <int>  <dbl>       <dbl>       <dbl>       <dbl>
# 1  1844      0           1           0           0
# 2  1844      1           0           1           0
# 3  1844      0           0           0           1
# 4  5830      1           1           0           0
# 5  5830      0           0           1           0
# 6  5830      0           0           0           1
# 7  6228      0           1           0           0
# 8  6228      0           0           1           0
# 9  6228      1           0           0           1
#10  8645      1           1           0           0
#11  8645      0           0           1           0
#12  8645      0           0           0           1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM