简体   繁体   English

将4个类别的比赛重新编码为3个类别,并在R中创建2个假人

[英]recoding race with 4 categories to 3 categories and creating 2 dummies in R

I am working with a variable for race that takes on the following values:1 Black, 2 Hispanic, 3 Mixed Race (Non-Hispanic), 4 Non-Black / Non-Hispanic. 我正在使用具有以下值的种族变量:1黑人,2西班牙裔,3混合种族(非西班牙裔),4非黑人/非西班牙裔。 I want to sum up 3 and 4 and make it the base category and keep Black and Hispanic. 我想对3和4求和,并使其成为基本类别,并保留Black和西班牙裔。 I tried to create 2 dummies (Black=1 and other Hispanic=1) and 2 extra columns are created, but the values in them are not 1 and 0 , but False and True . 我尝试创建2个虚拟变量(Black = 1和其他西班牙裔= 1),并创建了2个额外的列,但是其中的值不是10 ,而是FalseTrue The code I used: 我使用的代码:

nlsy2$Hispanic <- nlsy2$Race==2
nlsy2$Black <- nlsy2$Race==1
nlsy2$Race [ nlsy2$Race == 0 ] <- 3
nlsy2$Race [ nlsy2$Race == 0 ] <- 4

Also when I run summary(nlsy2$Hispanic) R gives me this output: 另外,当我运行summary(nlsy2$Hispanic) R给出以下输出:

   Mode   FALSE    TRUE    NA's 
logical    5594    1526       0 

Are the NA's problematic when running a glm? 运行glm时NA是否有问题? Also, if you have a better code solution in how I can recode the race variable, it would be much appreciated! 另外,如果您在如何重新编码race变量方面有更好的代码解决方案,将不胜感激! Thank you! 谢谢!

Does 是否

nlsy$Race[nlsy$Race == 3 | nlsy$Race == 4] <- 0
nlsy$Race <- factor(nlsy$Race)

not do the job? 不做这份工作? You're going to want it in factors rather than numeric when doing any modelling because these are categorical and you don't want to risk them being interpreted as numeric. 在进行任何建模时,您都希望使用因子而不是数值,因为它们是分类的,并且您不希望冒将它们解释为数值的风险。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM