I have a survey question in which respondents could select multiple answers (for 16 possible combinations, eg "Which color do you like?" can result in responses "red, blue, green, yellow" or "red, blue, green, black" etc.
These 16 possible combinations are contained in a spreadsheet:
Image 1: First two rows of the spreadsheet (full spreadsheet has 16 rows)
Example 1:
structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("red", "ruby"), class = "factor"),
V2 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L), .Label = c("blue", "violet"), class = "factor"),
V3 = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 2L, 2L), .Label = c("green", "turqoise"), class = "factor"),
V4 = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L), .Label = c("black", "yellow"), class = "factor")), .Names = c("V1",
"V2", "V3", "V4"), class = "data.frame", row.names = c(NA, -16L
))
The dataframe with responses has sixteen columns for this question (one column per every simple combination of colors). If respondent 1 selected the first combination, only the first column contains data; similarly, if respondent 2 selected the second combination, the second column contains data. The other are empty:
Image 2: The first two columns of the dataframe
Example 2:
structure(list(respondentID = 1:16, v1 = c(1L, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v2 = c(NA, 1L, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v3 = c(NA,
NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA),
v4 = c(NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L, 1L, NA,
NA, NA, NA), v5 = c(NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA), v6 = c(NA, 1L, NA, NA, NA, NA, NA,
NA, NA, 1L, NA, NA, NA, NA, NA, NA), v7 = c(NA, NA, NA, NA,
1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v8 = c(NA,
NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), v9 = c(NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA,
NA, NA, NA, NA), v10 = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), v11 = c(NA, NA, NA, NA,
NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA), v12 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA
), v13 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 1L, NA, NA), v14 = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), v15 = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v16 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L
)), .Names = c("respondentID", "v1", "v2", "v3", "v4", "v5",
"v6", "v7", "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15",
"v16"), class = "data.frame", row.names = c(NA, -16L))
(Of course, in practice respondent 1 didn't necessarily choose combination 1).
All the information in the dataframe is the number "1", which corresponds to appropriate combination in the spreadsheet.
In order to analyze responses to the question, I need to extract the combination from the spreadsheet and import it into the dataframe with responses, so that I get four new columns in the dataframe with the combination of colors chosen by a respondent (eg red, blue, green, yellow for respondent 1).
I don't think there's any way to do this using apply, so I guess I need to write a for loop to extract and import the data. Any advice on how to do this?
If you put the second data frame into a long shape, you can filter for just the combinations each person chose, and then join the second data frame with the first. The two data frames have combination labels that can be reconciled between the two to join on.
Note that I changed the column names in the first data frame, df1_with_id
, to be color1
, etc, only because otherwise you would have v1
, v2
, ... in one data frame, and V1
, V2
, ... representing something different in the other. Not a necessary change, but it's good to keep from confusing what different variables mean.
library(tidyverse)
df1 <- structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("red", "ruby"), class = "factor"),V2 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L,1L, 1L, 2L, 2L, 2L, 2L), .Label = c("blue", "violet"), class = "factor"),V3 = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L,2L, 2L, 1L, 1L, 2L, 2L), .Label = c("green", "turqoise"), class = "factor"),V4 = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,2L, 1L, 2L, 1L, 2L, 1L), .Label = c("black", "yellow"), class = "factor")), .Names = c("V1","V2", "V3", "V4"), class = "data.frame", row.names = c(NA, -16L))
df2 <- structure(list(respondentID = 1:16, v1 = c(1L, NA, NA, NA, NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v2 = c(NA, 1L, NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v3 = c(NA,NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA),v4 = c(NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L, 1L, NA,NA, NA, NA), v5 = c(NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA,NA, NA, NA, NA, NA, NA), v6 = c(NA, 1L, NA, NA, NA, NA, NA,NA, NA, 1L, NA, NA, NA, NA, NA, NA), v7 = c(NA, NA, NA, NA,1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v8 = c(NA,NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v9 = c(NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA,NA, NA, NA, NA), v10 = c(NA, NA, NA, NA, NA, NA, NA, NA,NA, NA, NA, NA, NA, NA, NA, NA), v11 = c(NA, NA, NA, NA,NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA), v12 = c(NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA), v13 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,NA, 1L, NA, NA), v14 = c(NA, NA, NA, NA, NA, NA, NA, NA,NA, NA, NA, NA, NA, NA, NA, NA), v15 = c(NA, NA, NA, NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v16 = c(NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L)), .Names = c("respondentID", "v1", "v2", "v3", "v4", "v5","v6", "v7", "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15","v16"), class = "data.frame", row.names = c(NA, -16L))
df1_with_id <- df1 %>%
setNames(paste0("color", 1:4)) %>%
mutate(combo = paste0("v", row_number()))
head(df1_with_id)
#> color1 color2 color3 color4 combo
#> 1 red blue green yellow v1
#> 2 red blue green black v2
#> 3 red blue turqoise yellow v3
#> 4 red blue turqoise black v4
#> 5 red violet green yellow v5
#> 6 red violet green black v6
df2 %>%
gather(key = combo, value = val, -respondentID) %>%
filter(!is.na(val)) %>%
left_join(df1_with_id, by = "combo")
#> respondentID combo val color1 color2 color3 color4
#> 1 1 v1 1 red blue green yellow
#> 2 2 v2 1 red blue green black
#> 3 7 v3 1 red blue turqoise yellow
#> 4 4 v4 1 red blue turqoise black
#> 5 11 v4 1 red blue turqoise black
#> 6 12 v4 1 red blue turqoise black
#> 7 3 v5 1 red violet green yellow
#> 8 2 v6 1 red violet green black
#> 9 10 v6 1 red violet green black
#> 10 5 v7 1 red violet turqoise yellow
#> 11 6 v8 1 red violet turqoise black
#> 12 8 v9 1 ruby blue green yellow
#> 13 9 v11 1 ruby blue turqoise yellow
#> 14 13 v12 1 ruby blue turqoise black
#> 15 14 v13 1 ruby violet green yellow
#> 16 16 v16 1 ruby violet turqoise black
Created on 2018-05-08 by the reprex package (v0.2.0).
I'm not sure to understand what you want to do. Using the tidyverse packages and the melt() function of the reshape2 package, you might try
df_respondent_combination <-
df_respondent %>%
melt(measure.vars = c(2:ncol(.)), na.rm = T) %>%
cbind(df_combination) %>%
select(-variable, -value) %>%
arrange(respondentID)
With df_respondent_combination the new dataframe expected, df_respondent your example 2 and df_combination your example 1.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.