简体   繁体   English

使用 NA 创建虚拟变量的更好方法 - 试图改进编码我已经做得很差

[英]better way to create dummy variables with NA- trying to improve coding I can already do poorly

I have a df with answers to survey questions, where df$Q57 is one of five answers:我有一个回答调查问题的 df,其中 df$Q57 是五个答案之一:

  1. "" (<- blank is basically NA) "" (<- 空白基本上是 NA)
  2. I would never do this我永远不会这样做
  3. I will do this in five years我会在五年内做到这一点
  4. I will do this in 10 years我会在 10 年内做到这一点
  5. I will do this eventually我最终会这样做

I want to create a dummy variable where:我想创建一个虚拟变量,其中:

  1. "" = NA "" = 不适用
  2. I would never do this = 0我永远不会这样做 = 0
  3. I will do this in five years = 1我会在五年内做到这一点 = 1
  4. I will do this in 10 years = 1我会在 10 年内做到这一点 = 1
  5. I will do this eventually = 1我最终会这样做 = 1

The best way I know how to do this is with a series of ifelse commands:我知道如何做到这一点的最好方法是使用一系列 ifelse 命令:

df$Q57_dummy <- ifelse(df$Q57 == "I would never install water control structures", 0, 1)
df$Q57_dummy <- ifelse(df$Q57 == "", NA, df$Q57_dummy)
table(df$Q57_dummy , useNA = "always")

This works, but I feel like there are cleaner ways to do this, and I was wondering if anyone had suggestions, because I will have to recode survey answers that have more than 1,0,NA outcomes.这有效,但我觉得有更简洁的方法可以做到这一点,我想知道是否有人有建议,因为我将不得不重新编码结果超过 1,0,NA 的调查答案。 Thanks!谢谢!

tidyverse approach: tidyverse方法:

df %>%
    mutate(Q57_dummy = case_when(
        Q57 == "" ~ NA,
        Q57 == "I would never do this" ~ FALSE,
        TRUE ~ TRUE # this is the else condition
    ))

You could take a few different approaches with the else condition depending on how you prefer your code style.您可以根据您喜欢的代码风格对 else 条件采取几种不同的方法。 The above works, but you could also do this with stringr :以上工作,但你也可以用stringr做到这stringr

str_detect(Q57, "I will do this") ~ TRUE

or manually input the options:或手动输入选项:

Q57 %in% c("I will do this in five years",...) ~ TRUE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM