could you help me understand how to write this function:
I have variables Q1, Q2....Q26, and each is a multiple-choice question, so I need to convert each to multiple columns. For example, Q6 asks what fruits do you like? and Q7 asks what vegetables do you like? I have written codes for each one (as below). These codes work well to deal with a single question (ie, to change contents of Q6 into multiple columns)
fulldata1<-fulldata %>%
separate(Q6, paste0("v", 1:6), sep='┋') %>%
gather(q6, val, v1:v6) %>%
na.exclude %>%
mutate(val=paste0("Q6", val), q6=1) %>%
spread(val, q6)
fulldata1[grep('^Q6', names(fulldata1), value = TRUE)][is.na(fulldata1[grep('^Q6', names(fulldata1), value = TRUE)])] <- 0
Now, I want to just write one function, in which I can just put variable names (Q1, Q2, Q3...). so I write the codes below but it does not work.
fulldata1<-fulldata %>%
separate(Question, paste0("v", 1:6), sep='┋') %>%
gather(q6, val, v1:v6) %>%
na.exclude %>%
mutate(val=paste0("Question", val), q6=1) %>%
spread(val, q6)
fulldata1[grep('^Question', names(fulldata1), value = TRUE)][is.na(fulldata1[grep('^Question', names(fulldata1), value = TRUE)])] <- 0
return(multiplechoice)
}
multiplechoice(Q6)
Could you help point out what I do wrong with the use of function in R? Thanks!
Here is a sample (thank you for reminding me):
structure(list(id = 1:10, Q6 = structure(c(2L, 4L, 1L, 7L, 5L,
6L, 2L, 5L, 3L, 1L), .Label = c("apple", "apple;orange;blueberry",
"apple;peach", "orange;blueberry", "orange;blueberry;peach",
"peach", "peach;apple"), class = "factor"), Q7 = structure(c(9L,
3L, 2L, 1L, 4L, 8L, 6L, 7L, 5L, 5L), .Label = c("cauliflower",
"kale", "kale;spinich", "kale;spinich;cauliflower", "none", "potato;kale",
"potato;spinich;cauliflower", "spinich; kale;cauliflower", "spinich;kale"
), class = "factor")), row.names = c(NA, 10L), class = "data.frame")
I think the path of least resistance here is with data.table
's tstrsplit
:
library(data.table)
setDT(data)[,lapply(colnames(.SD),function(x) {
y <- tstrsplit(.SD[[x]],";")
setNames(as.data.table(y),paste0(paste0(x,"."),1:length(y)))
}),
.SDcols = setdiff(names(data),"id")]
Q6.1 Q6.2 Q6.3 Q7.1 Q7.2 Q7.3
1: apple orange blueberry spinich kale <NA>
2: orange blueberry <NA> kale spinich <NA>
3: apple <NA> <NA> kale <NA> <NA>
4: peach apple <NA> cauliflower <NA> <NA>
5: orange blueberry peach kale spinich cauliflower
6: peach <NA> <NA> spinich kale cauliflower
7: apple orange blueberry potato kale <NA>
8: orange blueberry peach potato spinich cauliflower
9: apple peach <NA> none <NA> <NA>
10: apple <NA> <NA> none <NA> <NA>
Well to answer your question I'd say it's just not especially easy to use separate
inside mutate
. So I stopped trying and went with the simplest function that worked.
head(fulldata)
#> id Q6 Q7
#> 1 1 apple;orange;blueberry spinich;kale
#> 2 2 orange;blueberry kale;spinich
#> 3 3 apple kale
#> 4 4 peach;apple cauliflower
#> 5 5 orange;blueberry;peach kale;spinich;cauliflower
#> 6 6 peach spinich; kale;cauliflower
Using separate
to split on ;
(assumes no more than 6 categories right now) and stringr::str_subset
to get variables names that start with "Q"
library(stringr)
library(tidyr)
# Our own little custom function
sep <- function(...) {
dots <- list(...)
separate_(..., into = sprintf("%s_choice%d", dots[[2]], 1:6), fill = "right", remove = TRUE)
}
myQuestions <- stringr::str_subset(names(fulldata), "^Q")
separated_data <- fulldata %>% Reduce(f = sep, x = myQuestions)
head(separated_data)
#> id Q6_choice1 Q6_choice2 Q6_choice3 Q6_choice4 Q6_choice5 Q6_choice6
#> 1 1 apple orange blueberry <NA> <NA> <NA>
#> 2 2 orange blueberry <NA> <NA> <NA> <NA>
#> 3 3 apple <NA> <NA> <NA> <NA> <NA>
#> 4 4 peach apple <NA> <NA> <NA> <NA>
#> 5 5 orange blueberry peach <NA> <NA> <NA>
#> 6 6 peach <NA> <NA> <NA> <NA> <NA>
#> Q7_choice1 Q7_choice2 Q7_choice3 Q7_choice4 Q7_choice5 Q7_choice6
#> 1 spinich kale <NA> <NA> <NA> <NA>
#> 2 kale spinich <NA> <NA> <NA> <NA>
#> 3 kale <NA> <NA> <NA> <NA> <NA>
#> 4 cauliflower <NA> <NA> <NA> <NA> <NA>
#> 5 kale spinich cauliflower <NA> <NA> <NA>
#> 6 spinich kale cauliflower <NA> <NA> <NA>
Your data:
fulldata <- structure(list(id = 1:10, Q6 = structure(c(2L, 4L, 1L, 7L, 5L,
6L, 2L, 5L, 3L, 1L), .Label = c("apple", "apple;orange;blueberry",
"apple;peach", "orange;blueberry", "orange;blueberry;peach",
"peach", "peach;apple"), class = "factor"), Q7 = structure(c(9L,
3L, 2L, 1L, 4L, 8L, 6L, 7L, 5L, 5L), .Label = c("cauliflower",
"kale", "kale;spinich", "kale;spinich;cauliflower", "none", "potato;kale",
"potato;spinich;cauliflower", "spinich; kale;cauliflower", "spinich;kale"
), class = "factor")), row.names = c(NA, 10L), class = "data.frame")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.