简体   繁体   English

使用 R 将行转换为分类列

[英]Converting rows into a categorical column using R

I have a transcribed interview and the data is organized as follows:我有一个转录的采访,数据组织如下:

[1,]  "Interviewer"
[2,]  "What is your favorite food?"
[3,]  "Interviewee"
[4,]  "I love to eat pizza"
[5,]  "Interviewer"
[6,]  "Cool. But have you ever tried eating salad?"
[7,]  "Interviewee "
[8,]  "Yeah..."
[9,]  "Interviewer"
[10,] "I love salad, pizza is bad."
[11,] "Interviewee "
[12,] "I don't totally agree" 

I would like to remove the author of the speech from the rows and turn it into a categorical column, as in the example:我想从行中删除演讲的作者并将其变成分类列,如示例所示:

      [,1]                [,2]  
[1,]  "Interviewer"       "What is your favorite food?"
[2,]  "Interviewee"       "I love to eat pizza"
[3,]  "Interviewer"       "Cool. But have you ever tried eating a salad?"
[4,]  "Interviewee"       "Yeah..."
[5,]  "Interviewer"       "I love salad, pizza is bad."
[6,]  "Interviewee"       "I don't totally agree"

The interview considers the conversation between two people.采访考虑了两个人之间的对话。 Does anyone know how to do this?有谁知道如何做到这一点? Thanks in advance!提前致谢!

Here is an alternative approach:这是另一种方法:

library(tidyverse)

tibble(v1 = v1) %>% 
  mutate(v2 = lead(v1)) %>% 
  filter(row_number() %% 2 == 1) %>% 
  as.matrix()

     v1             v2                                           
[1,] "Interviewer"  "What is your favorite food?"                
[2,] "Interviewee"  "I love to eat pizza"                        
[3,] "Interviewer"  "Cool. But have you ever tried eating salad?"
[4,] "Interviewee " "Yeah..."                                    
[5,] "Interviewer"  "I love salad, pizza is bad."                
[6,] "Interviewee " "I don't totally agree" 

We can create a grouping variable with grepl on the 'Interview' keyword, split and rbind我们可以在'Interview'关键字, split和rbind上用grepl创建一个分组变量

do.call(rbind, split(v1, cumsum(grepl("^Interview", v1))))

-output -输出

 [,1]           [,2]                                         
1 "Interviewer"  "What is your favorite food?"                
2 "Interviewee"  "I love to eat pizza"                        
3 "Interviewer"  "Cool. But have you ever tried eating salad?"
4 "Interviewee " "Yeah..."                                    
5 "Interviewer"  "I love salad, pizza is bad."                
6 "Interviewee " "I don't totally agree"        

If these are alternate elements, then either use a recycling index to create two columns如果这些是备用元素,则要么使用循环索引来创建两列

cbind(v1[c(TRUE, FALSE)], v1[c(FALSE, TRUE)])
     [,1]           [,2]                                         
[1,] "Interviewer"  "What is your favorite food?"                
[2,] "Interviewee"  "I love to eat pizza"                        
[3,] "Interviewer"  "Cool. But have you ever tried eating salad?"
[4,] "Interviewee " "Yeah..."                                    
[5,] "Interviewer"  "I love salad, pizza is bad."                
[6,] "Interviewee " "I don't totally agree"   

Or use matrix或者使用matrix

matrix(v1, ncol = 2, byrow = TRUE)
     [,1]           [,2]                                         
[1,] "Interviewer"  "What is your favorite food?"                
[2,] "Interviewee"  "I love to eat pizza"                        
[3,] "Interviewer"  "Cool. But have you ever tried eating salad?"
[4,] "Interviewee " "Yeah..."                                    
[5,] "Interviewer"  "I love salad, pizza is bad."                
[6,] "Interviewee " "I don't totally agree"                

data数据

v1 <- c("Interviewer", "What is your favorite food?", "Interviewee", 
"I love to eat pizza", "Interviewer", 
"Cool. But have you ever tried eating salad?", 
"Interviewee ", "Yeah...", "Interviewer", "I love salad, pizza is bad.", 
"Interviewee ", "I don't totally agree")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM