简体   繁体   English

将行应用于名义行之间的值

[英]Apply row to values between nominal rows

I have no doubt this question has been asked before, but I cannot for the life of me figure out how to word it in a way that I can find the response.我毫不怀疑以前有人问过这个问题,但我一生都无法弄清楚如何以一种我能找到答案的方式来表达它。

I have the following data coming in from a.csv我有以下数据来自 a.csv

1 Q1. Do you run on trails?                                                                                    NA     NA   
2 YES                                                                                                          97.17% 2507 
3 NO                                                                                                           2.83%  73   
4 Q2. Do you participate in organized trail work, maintenance, building, or cleaning up trails (ie: plogging)? NA     NA   
5 YES                                                                                                          49.88% 1283 
6 NO                                                                                                           50.12% 1289 

The questions and possible responses aren't all the same, so the workflow I imagine is:问题和可能的回答不尽相同,所以我想象的工作流程是:

  • For every row that matches "Q\\D?"对于匹配“Q\\D?”的每一行
  • Write a column with that value用该值写一列
  • For each row before the next match.对于下一场比赛之前的每一行。

Ideally, the end result would be:理想情况下,最终结果将是:

Q1...   YES  10% 435
Q1...   NO   90% 783
Q2...   YES  10% 435
Q2...   NO   90% 783

Sorry I had to edit, I finally got it对不起,我不得不编辑,我终于明白了

  1. Save your sheet as csv using, as separator and ' as a string delimiter.将您的工作表保存为 csv 使用,作为分隔符和 ' 作为字符串分隔符。

  2. run this code运行此代码

Please communicate any concern or doubt.请交流任何疑虑或疑问。 Notice that I read the file as text using readLines() and then use the colon char to break them, except in the question, where I use the string delimiter.请注意,我使用readLines()将文件作为文本读取,然后使用冒号字符将它们分开,除了在问题中我使用字符串分隔符的地方。 It is dirty but it workts.它很脏,但它可以工作。

Best最好的

JA JA

  • Load packages加载包
library(data.table)
library(stringr)
  • Read data as text以文本形式读取数据
dat <- readLines("~/Documents/test.SO/test1.csv")
  • Establish which lines are questions with a grep, this is important since we will loop for each of those lines: the premise is: from each question line until the next, all lines must be answers.使用 grep 确定哪些行是问题,这很重要,因为我们将循环遍历这些行:前提是:从每个问题行到下一个问题行,所有行都必须是答案。
qlines <- grep("Q[0-9]\\.", dat)
  • just set an empty list to store questions in and a counter for the list element we are going to store只需设置一个空列表来存储问题,并为我们要存储的列表元素设置一个计数器
all.questions <- list()
i <- 1

-Now here is the sweet stuff: by steps: -现在这是甜蜜的东西:按步骤:

  • S1: we extract the text of the question itself, we do this using dat[q] since we are looping q, because we already knew the lines that are questions. S1:我们提取问题本身的文本,我们使用dat[q]执行此操作,因为我们正在循环 q,因为我们已经知道问题所在的行。 Remember, from this line to the next all lines are answers, unless this is the last question then from this line to the last all lines are answers, that's why the if is there.请记住,从这一行到下一行都是答案,除非这是最后一个问题,否则从这一行到最后一行都是答案,这就是 if 存在的原因。 The sub is just extracting between the field delimiters you used to store, ie '子只是在您用来存储的字段分隔符之间提取,即'
  • S2: break apart the text line with the answer, that should contain the answer itself, the percent, and the number of people who answered. S2:用答案分开文本行,其中应包含答案本身、百分比和回答的人数。 With unlist(str_split(dat[a], ",")) we break the line into a character vector each ",", which is the field delimiter.使用unlist(str_split(dat[a], ","))我们将行分成一个字符向量,每个“,”是字段分隔符。 then we have a character vector that we know it contains ordered pieces of info as stated above.然后我们有一个字符向量,我们知道它包含如上所述的有序信息。 From here we do ans.dat[1] we know is the answer itself, then the next element is the percent and so on.从这里我们做ans.dat[1]我们知道是答案本身,然后下一个元素是百分比等等。 we are doing the percent <- ans.dat[2] thing, assigning to a variable just slowly extracting the information from that text line so at the end we can construct a table with the elements like we like it.我们正在做percent <- ans.dat[2]的事情,分配给一个变量,只是慢慢地从该文本行中提取信息,所以最后我们可以构建一个包含我们喜欢的元素的表格。
  • S3: now that we have the separate elements we want, we assemble a row of the table we want, remember this internal loop is for each answer is this is the row for that answer, syntax is just data.table syntax. S3:现在我们有了我们想要的单独元素,我们组装了一行我们想要的表格,记住这个内部循环是针对每个答案的,这是那个答案的行,语法就是 data.table 语法。 (sorry I am no longer familiar with data.frames syntax. And we rbind-it. (对不起,我不再熟悉 data.frames 语法。我们 rbind-it。

Internal cycle will exhaust answers for this question external cycle will exhaust questions for the text.内部循环将耗尽此问题的答案 外部循环将耗尽文本的问题。

Side note, I you can eliminate the remaining colons with by adding a second sub: question <- gsub("( |,)$", "", question) after the internal loop closes.旁注,我可以通过在内部循环关闭后添加第二个 sub 来消除剩余的冒号: question <- gsub("( |,)$", "", question)

for(q in qlines){

  question <- sub(".*'([^']*)'.*", "\\1", dat[q]) #S1

  if(which(q==qlines) == length(qlines)){
    ans.lines <- (q+1):length(dat)
  }else{
    ans.lines <- (q+1) : (qlines[which(qlines==q)+1] - 1)
  }

  all.answers <- data.table()

  for(a in ans.lines){

   ans.dat <-  unlist(str_split(dat[a], ",")) #S2
   ans <- ans.dat[1]
   percent <- ans.dat[2]
   responders <- ans.dat[3]
   ans.row <- data.table("ans"=ans, "percent"=percent, "responders"=responders) #S3
   all.answers <- rbind(all.answers, ans.row)

  }

  all.questions[[i]] <- question.table <- cbind(question, all.answers)
  i <- i+1

}

all.questions

[[1]]
                       question ans percent responders
1: Q1. Do you run on trails? ,, YES      50        100
2: Q1. Do you run on trails? ,,  NO      50        100

[[2]]
                                                                                                       question ans percent responders
1: Q2. Do you participate in organized trail work, maintenance, building, or cleaning up trails (ie: plogging)? YES      50        100
2: Q2. Do you participate in organized trail work, maintenance, building, or cleaning up trails (ie: plogging)?  NO      50        100

[[3]]
                    question    ans percent responders
1: Q3. What is your gender,,   MALE      50        100
2: Q3. What is your gender,, FEMALE      49         99
3: Q3. What is your gender,,  OTHER       1          1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM