简体   繁体   中英

Apply row to values between nominal rows

I have no doubt this question has been asked before, but I cannot for the life of me figure out how to word it in a way that I can find the response.

I have the following data coming in from a.csv

1 Q1. Do you run on trails?                                                                                    NA     NA   
2 YES                                                                                                          97.17% 2507 
3 NO                                                                                                           2.83%  73   
4 Q2. Do you participate in organized trail work, maintenance, building, or cleaning up trails (ie: plogging)? NA     NA   
5 YES                                                                                                          49.88% 1283 
6 NO                                                                                                           50.12% 1289 

The questions and possible responses aren't all the same, so the workflow I imagine is:

  • For every row that matches "Q\\D?"
  • Write a column with that value
  • For each row before the next match.

Ideally, the end result would be:

Q1...   YES  10% 435
Q1...   NO   90% 783
Q2...   YES  10% 435
Q2...   NO   90% 783

Sorry I had to edit, I finally got it

  1. Save your sheet as csv using, as separator and ' as a string delimiter.

  2. run this code

Please communicate any concern or doubt. Notice that I read the file as text using readLines() and then use the colon char to break them, except in the question, where I use the string delimiter. It is dirty but it workts.

Best

JA

  • Load packages
library(data.table)
library(stringr)
  • Read data as text
dat <- readLines("~/Documents/test.SO/test1.csv")
  • Establish which lines are questions with a grep, this is important since we will loop for each of those lines: the premise is: from each question line until the next, all lines must be answers.
qlines <- grep("Q[0-9]\\.", dat)
  • just set an empty list to store questions in and a counter for the list element we are going to store
all.questions <- list()
i <- 1

-Now here is the sweet stuff: by steps:

  • S1: we extract the text of the question itself, we do this using dat[q] since we are looping q, because we already knew the lines that are questions. Remember, from this line to the next all lines are answers, unless this is the last question then from this line to the last all lines are answers, that's why the if is there. The sub is just extracting between the field delimiters you used to store, ie '
  • S2: break apart the text line with the answer, that should contain the answer itself, the percent, and the number of people who answered. With unlist(str_split(dat[a], ",")) we break the line into a character vector each ",", which is the field delimiter. then we have a character vector that we know it contains ordered pieces of info as stated above. From here we do ans.dat[1] we know is the answer itself, then the next element is the percent and so on. we are doing the percent <- ans.dat[2] thing, assigning to a variable just slowly extracting the information from that text line so at the end we can construct a table with the elements like we like it.
  • S3: now that we have the separate elements we want, we assemble a row of the table we want, remember this internal loop is for each answer is this is the row for that answer, syntax is just data.table syntax. (sorry I am no longer familiar with data.frames syntax. And we rbind-it.

Internal cycle will exhaust answers for this question external cycle will exhaust questions for the text.

Side note, I you can eliminate the remaining colons with by adding a second sub: question <- gsub("( |,)$", "", question) after the internal loop closes.

for(q in qlines){

  question <- sub(".*'([^']*)'.*", "\\1", dat[q]) #S1

  if(which(q==qlines) == length(qlines)){
    ans.lines <- (q+1):length(dat)
  }else{
    ans.lines <- (q+1) : (qlines[which(qlines==q)+1] - 1)
  }

  all.answers <- data.table()

  for(a in ans.lines){

   ans.dat <-  unlist(str_split(dat[a], ",")) #S2
   ans <- ans.dat[1]
   percent <- ans.dat[2]
   responders <- ans.dat[3]
   ans.row <- data.table("ans"=ans, "percent"=percent, "responders"=responders) #S3
   all.answers <- rbind(all.answers, ans.row)

  }

  all.questions[[i]] <- question.table <- cbind(question, all.answers)
  i <- i+1

}

all.questions

[[1]]
                       question ans percent responders
1: Q1. Do you run on trails? ,, YES      50        100
2: Q1. Do you run on trails? ,,  NO      50        100

[[2]]
                                                                                                       question ans percent responders
1: Q2. Do you participate in organized trail work, maintenance, building, or cleaning up trails (ie: plogging)? YES      50        100
2: Q2. Do you participate in organized trail work, maintenance, building, or cleaning up trails (ie: plogging)?  NO      50        100

[[3]]
                    question    ans percent responders
1: Q3. What is your gender,,   MALE      50        100
2: Q3. What is your gender,, FEMALE      49         99
3: Q3. What is your gender,,  OTHER       1          1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM