简体   繁体   中英

how to write function in R

could you help me understand how to write this function:

I have variables Q1, Q2....Q26, and each is a multiple-choice question, so I need to convert each to multiple columns. For example, Q6 asks what fruits do you like? and Q7 asks what vegetables do you like? I have written codes for each one (as below). These codes work well to deal with a single question (ie, to change contents of Q6 into multiple columns)

fulldata1<-fulldata %>% 
  separate(Q6, paste0("v", 1:6), sep='┋') %>% 
  gather(q6, val, v1:v6) %>% 
  na.exclude %>% 
  mutate(val=paste0("Q6", val), q6=1) %>% 
  spread(val, q6)
fulldata1[grep('^Q6', names(fulldata1), value = TRUE)][is.na(fulldata1[grep('^Q6', names(fulldata1), value = TRUE)])] <- 0

Now, I want to just write one function, in which I can just put variable names (Q1, Q2, Q3...). so I write the codes below but it does not work.

  fulldata1<-fulldata %>% 
    separate(Question, paste0("v", 1:6), sep='┋') %>% 
    gather(q6, val, v1:v6) %>% 
    na.exclude %>% 
    mutate(val=paste0("Question", val), q6=1) %>% 
    spread(val, q6)
  fulldata1[grep('^Question', names(fulldata1), value = TRUE)][is.na(fulldata1[grep('^Question', names(fulldata1), value = TRUE)])] <- 0
  return(multiplechoice)
}
multiplechoice(Q6)

Could you help point out what I do wrong with the use of function in R? Thanks!

Here is a sample (thank you for reminding me):

structure(list(id = 1:10, Q6 = structure(c(2L, 4L, 1L, 7L, 5L, 
6L, 2L, 5L, 3L, 1L), .Label = c("apple", "apple;orange;blueberry", 
"apple;peach", "orange;blueberry", "orange;blueberry;peach", 
"peach", "peach;apple"), class = "factor"), Q7 = structure(c(9L, 
3L, 2L, 1L, 4L, 8L, 6L, 7L, 5L, 5L), .Label = c("cauliflower", 
"kale", "kale;spinich", "kale;spinich;cauliflower", "none", "potato;kale", 
"potato;spinich;cauliflower", "spinich; kale;cauliflower", "spinich;kale"
), class = "factor")), row.names = c(NA, 10L), class = "data.frame")

I think the path of least resistance here is with data.table 's tstrsplit :

library(data.table)
setDT(data)[,lapply(colnames(.SD),function(x) {
    y <- tstrsplit(.SD[[x]],";")
    setNames(as.data.table(y),paste0(paste0(x,"."),1:length(y)))
  }),
  .SDcols = setdiff(names(data),"id")]
      Q6.1      Q6.2      Q6.3        Q7.1    Q7.2        Q7.3
 1:  apple    orange blueberry     spinich    kale        <NA>
 2: orange blueberry      <NA>        kale spinich        <NA>
 3:  apple      <NA>      <NA>        kale    <NA>        <NA>
 4:  peach     apple      <NA> cauliflower    <NA>        <NA>
 5: orange blueberry     peach        kale spinich cauliflower
 6:  peach      <NA>      <NA>     spinich    kale cauliflower
 7:  apple    orange blueberry      potato    kale        <NA>
 8: orange blueberry     peach      potato spinich cauliflower
 9:  apple     peach      <NA>        none    <NA>        <NA>
10:  apple      <NA>      <NA>        none    <NA>        <NA>

Well to answer your question I'd say it's just not especially easy to use separate inside mutate . So I stopped trying and went with the simplest function that worked.

head(fulldata)
#>   id                     Q6                        Q7
#> 1  1 apple;orange;blueberry              spinich;kale
#> 2  2       orange;blueberry              kale;spinich
#> 3  3                  apple                      kale
#> 4  4            peach;apple               cauliflower
#> 5  5 orange;blueberry;peach  kale;spinich;cauliflower
#> 6  6                  peach spinich; kale;cauliflower

Using separate to split on ; (assumes no more than 6 categories right now) and stringr::str_subset to get variables names that start with "Q"

library(stringr)
library(tidyr)

# Our own little custom function

  sep <- function(...) {
    dots <- list(...)
    separate_(..., into = sprintf("%s_choice%d", dots[[2]], 1:6), fill = "right", remove = TRUE)
  }

  myQuestions <- stringr::str_subset(names(fulldata), "^Q")
  separated_data <- fulldata %>% Reduce(f = sep, x = myQuestions)
  head(separated_data)
#>   id Q6_choice1 Q6_choice2 Q6_choice3 Q6_choice4 Q6_choice5 Q6_choice6
#> 1  1      apple     orange  blueberry       <NA>       <NA>       <NA>
#> 2  2     orange  blueberry       <NA>       <NA>       <NA>       <NA>
#> 3  3      apple       <NA>       <NA>       <NA>       <NA>       <NA>
#> 4  4      peach      apple       <NA>       <NA>       <NA>       <NA>
#> 5  5     orange  blueberry      peach       <NA>       <NA>       <NA>
#> 6  6      peach       <NA>       <NA>       <NA>       <NA>       <NA>
#>    Q7_choice1 Q7_choice2  Q7_choice3 Q7_choice4 Q7_choice5 Q7_choice6
#> 1     spinich       kale        <NA>       <NA>       <NA>       <NA>
#> 2        kale    spinich        <NA>       <NA>       <NA>       <NA>
#> 3        kale       <NA>        <NA>       <NA>       <NA>       <NA>
#> 4 cauliflower       <NA>        <NA>       <NA>       <NA>       <NA>
#> 5        kale    spinich cauliflower       <NA>       <NA>       <NA>
#> 6     spinich       kale cauliflower       <NA>       <NA>       <NA>

Your data:

fulldata <- structure(list(id = 1:10, Q6 = structure(c(2L, 4L, 1L, 7L, 5L, 
                                                       6L, 2L, 5L, 3L, 1L), .Label = c("apple", "apple;orange;blueberry", 
                                                                                       "apple;peach", "orange;blueberry", "orange;blueberry;peach", 
                                                                                       "peach", "peach;apple"), class = "factor"), Q7 = structure(c(9L, 
                                                                                                                                                    3L, 2L, 1L, 4L, 8L, 6L, 7L, 5L, 5L), .Label = c("cauliflower", 
                                                                                                                                                                                                    "kale", "kale;spinich", "kale;spinich;cauliflower", "none", "potato;kale", 
                                                                                                                                                                                                    "potato;spinich;cauliflower", "spinich; kale;cauliflower", "spinich;kale"
                                                                                                                                                    ), class = "factor")), row.names = c(NA, 10L), class = "data.frame")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM