R 中的自变量和选择类别

Question

我有一个如下的数据集。 第一列是 id。 第二列有年龄组。

dat1 <- read.table(header=TRUE, text="
ID  Age  
8645    15-24  
6228    35-44  
5830    15-24  
1844    25-34  
")
    ID   Age
1 8645 15-24
2 6228 35-44
3 5830 15-24
4 1844 25-34

我想根据变量类别将类别转换为二进制变量。 有几种选择。 correlationfunnel漏斗在这里很容易使用。

library(correlationfunnel)
library(dplyr)
dat1 %>%
  select(-ID) %>%
  binarize()
  `Age__15-24` `Age__25-34` `Age__35-44`
         <dbl>        <dbl>        <dbl>
1            1            0            0
2            0            0            1
3            1            0            0
4            0            1            0

但是，对于选择建模框架，我需要生成一个如下所示的矩阵。 其中根据dat1的第 2 列中的类别重复行。 需要在每行（列AgeInd ）中具有二元结果的列。

     ID AgeInd Age_15_24 Age_25_34 Age_35_44
1  8645      1         1         0         0
2  8645      0         0         1         0
3  8645      0         0         0         1
4  6228      0         1         0         0
5  6228      0         0         1         0
6  6228      1         0         0         1
7  5830      1         1         0         0
8  5830      0         0         1         0
9  5830      0         0         0         1
10 1844      0         1         0         0
11 1844      1         0         1         0
12 1844      0         0         0         1

Answer 1

这是使用dplyr和tidyr的方法：

library(dplyr)
library(tidyr)

dat1 %>%
   mutate(AgeInd = 1) %>%
   complete(ID, Age, fill = list(AgeInd = 0)) %>%
   mutate(col = row_number(), n = 1) %>%
   pivot_wider(names_from = Age, values_from = n, 
               names_prefix = 'Age_', values_fill = list(n = 0)) %>%
   select(-col)


# A tibble: 12 x 5
#     ID AgeInd `Age_15-24` `Age_25-34` `Age_35-44`
#   <int>  <dbl>       <dbl>       <dbl>       <dbl>
# 1  1844      0           1           0           0
# 2  1844      1           0           1           0
# 3  1844      0           0           0           1
# 4  5830      1           1           0           0
# 5  5830      0           0           1           0
# 6  5830      0           0           0           1
# 7  6228      0           1           0           0
# 8  6228      0           0           1           0
# 9  6228      1           0           0           1
#10  8645      1           1           0           0
#11  8645      0           0           1           0
#12  8645      0           0           0           1

R 中的自变量和选择类别

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-14 05:25:43

R 中的自变量和选择类别

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-14 05:25:43

解决方案1
1 已采纳 2020-06-14 05:25:43