繁体   English   中英

在 R 中创建以字符列为条件的二进制列

[英]In R create binary columns conditional on character columns

disorder <- c("depression","adhd","anxiety","bipolar",NA)
keywords <- c("depression | depressive", "adhd","anxiety","bi","n/a")
df1 <- as.data.frame(cbind(disorder,keywords))

survey <- c("depression adhd",
        "bipolar disorder",
        "bi  adhd",
        "adhd  anxiety",
        "depressive",
        "adhd bi",
        "n/a")
df2 <- as.data.frame(survey)
df2$depression <- ifelse(str_detect(df2$survey,df1$keywords[1]),"yes","no")
df2$adhd <- ifelse(str_detect(df2$survey,df1$keywords[2]),"yes","no")
df2$anxiety <- ifelse(str_detect(df2$survey, df1$keywords[3]),"yes","no")
df2$bipolar <- ifelse(str_detect(df2$survey, df1$keywords[4]),"yes","no")
df2$na <-  ifelse(str_detect(df2$survey, df1$keywords[5]),"yes","no")
df2

                sx depression adhd anxiety bipolar  na
1  depression adhd        yes  yes      no      no  no
2 bipolar disorder         no   no      no     yes  no
3         bi  adhd         no  yes      no     yes  no
4    adhd  anxiety         no  yes     yes      no  no
5       depressive         yes   no     no      no  no [edited] it should be yes
6          adhd bi         no  yes      no     yes  no
7              n/a         no   no      no      no yes

我正在尝试与调查和关键字匹配,以便我可以如上列出。 我可以用任何类型的循环来做到这一点吗? 我有很长的障碍列表,所以真的想制作一个可复制的代码而不是手动完成。

df1 keywords列中删除空格。

df1 <- transform(df1,  keywords = gsub('\\s', '', keywords))

使用tidyverse您可以执行以下操作:

library(tidyverse)

result <- bind_cols(df2, map_dfc(df1$key, 
                         ~ifelse(str_detect(df2$sx,.x),"yes","no"))) %>%
          rename_with(~df1$key, -1)
result

#            survey depression|depressive adhd anxiety  bi n/a
#1  depression adhd                   yes  yes      no  no  no
#2 bipolar disorder                    no   no      no yes  no
#3         bi  adhd                    no  yes      no yes  no
#4    adhd  anxiety                    no  yes     yes  no  no
#5       depressive                   yes   no      no  no  no
#6          adhd bi                    no  yes      no yes  no
#7              n/a                    no   no      no  no yes

在基础 R 中,您可以使用lapply做到这lapply

df2[df1$key] <- lapply(df1$keywords, function(x) 
                       ifelse(grepl(x, df2$survey), 'yes','no'))
df2

我们可以在没有ifelse情况下做到这ifelse

df2[df1$key] <- lapply(df1$key, function(x) c("no", "yes")[grepl(x, df$sx) + 1])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM