R：提取图案，不同时间

Question

I've the following problem: I've a text, separated by chapters and stored by a vector. 我有以下问题：我有一个文本，由各章分隔并由矢量存储。 Suppose something like: 假设类似：

text <- c("Here are information about topic1.", 
"Here are some information about topic2 or topic3.", 
"Chapter number 4 is really annoying.", 
"Topic4 is discussed in this chapter.")

And I want to extract the different topics mentioned in the different chapters. 我想提取不同章节中提到的不同主题。 So my output should be something like: 所以我的输出应该是这样的：

output
      [1]       [2]
[1] "topic1"
[2] "topic2" "topic3"
[3]
[4] "topic3"

So I have some rows with multiple findings and some with no match. 所以我有些行有多个发现，有些行没有匹配。

I tried things with str_extract_all and unlist the list, but got problems causing the different number of row elements. 我尝试使用str_extract_all进行操作并取消列出列表，但是遇到了导致行元素数量不同的问题。

Thanks to all! 谢谢大家！

Answer 1

You can use rbind.fill.matrix from plyr . 您可以使用rbind.fill.matrix的plyr 。

text <- c("Here are information about topic1.", 
          "Here are some information about topic2 or topic3.", 
          "Chapter number 4 is really annoying.", 
          "Topic4 is discussed in this chapter.")

library(stringr)
library(plyr)

xy <- str_extract_all(text, pattern = "[Tt]opic\\d+")
xy <- sapply(xy, FUN = function(x) matrix(x, nrow = 1))
rbind.fill.matrix(xy) # from plyr

     1        2       
[1,] "topic1" NA      
[2,] "topic2" "topic3"
[3,] NA       NA      
[4,] "Topic4" NA

R：提取图案，不同时间

问题描述

1 个解决方案

解决方案1
4 已采纳 2017-04-12 08:25:59

R：提取图案，不同时间

问题描述

1 个解决方案

解决方案1 4 已采纳 2017-04-12 08:25:59

解决方案1
4 已采纳 2017-04-12 08:25:59