简体   繁体   中英

Recode Multiple Columns to Single Variable

I have some qualitative data that I have coded into various categories and I want to provide summaries for subgroups. The RQDA package is great for coding interviews but I've struggled with creating summaries for open ended survey responses. I've managed to export the coded file into HTML, and copy/paste into Excel. I now have 500 lines with all the categories in distinct columns however the same code may appear in different columns. For example, some data:

a <- c("ResponseA", "ResponseB", "ResponseC", "ResponseD", "NA")
b <- c("ResponseD", "ResponseC", "NA", "NA","NA")
c <- c("ResponseB", "ResponseA", "ResponseE", "NA", "NA")
d <- c("ResponseC", "ResponseB", "ResponseA", "NA", "NA")
df <- data.frame (a,b,c,d)

I'd like to be able to run something like

df$ResponseA <- recode (df$a | df$b | df$c, "
                        'ResponseA' = '1'; 
                         else='0' ")
df$ResponseB <- recode (df$a | df$b | df$c, "
                        'ResponseB' = '1'; 
                         else='0' ")

In short, I'd like scan 9 columns and recode into a single binary variable.

If I understand the question correctly, perhaps you can try something like this:

## Convert your data into a long format first
dfL <- cbind(id = sequence(nrow(df)), stack(lapply(df, as.character)))

## The next three lines are mostly cleanup
dfL$id <- factor(dfL$id, sequence(nrow(df)))
dfL$values[dfL$values == "NA"] <- NA
dfL <- dfL[complete.cases(dfL), ]

## `table` is the real workhorse here
cbind(df, (table(dfL[1:2]) > 0) * 1)
#           a         b         c         d ResponseA ResponseB ResponseC ResponseD ResponseE
# 1 ResponseA ResponseD ResponseB ResponseC         1         1         1         1         0
# 2 ResponseB ResponseC ResponseA ResponseB         1         1         1         0         0
# 3 ResponseC        NA ResponseE ResponseA         1         0         1         0         1
# 4 ResponseD        NA        NA        NA         0         0         0         1         0
# 5        NA        NA        NA        NA         0         0         0         0         0

You can also try the following:

(table(rep(1:nrow(df), ncol(df)), unlist(df)) > 0) * 1L
#    
#     NA ResponseA ResponseB ResponseC ResponseD ResponseE
#   1  0         1         1         1         1         0
#   2  0         1         1         1         0         0
#   3  1         1         0         1         0         1
#   4  1         0         0         0         1         0
#   5  1         0         0         0         0         0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM