简体   繁体   English

R:提取图案,不同时间

[英]R: extracting pattern, different times

I've the following problem: I've a text, separated by chapters and stored by a vector. 我有以下问题:我有一个文本,由各章分隔并由矢量存储。 Suppose something like: 假设类似:

text <- c("Here are information about topic1.", 
"Here are some information about topic2 or topic3.", 
"Chapter number 4 is really annoying.", 
"Topic4 is discussed in this chapter.")

And I want to extract the different topics mentioned in the different chapters. 我想提取不同章节中提到的不同主题。 So my output should be something like: 所以我的输出应该是这样的:

output
      [1]       [2]
[1] "topic1"
[2] "topic2" "topic3"
[3]
[4] "topic3"

So I have some rows with multiple findings and some with no match. 所以我有些行有多个发现,有些行没有匹配。

I tried things with str_extract_all and unlist the list, but got problems causing the different number of row elements. 我尝试使用str_extract_all进行操作并取消列出列表,但是遇到了导致行元素数量不同的问题。

Thanks to all! 谢谢大家!

You can use rbind.fill.matrix from plyr . 您可以使用rbind.fill.matrixplyr

text <- c("Here are information about topic1.", 
          "Here are some information about topic2 or topic3.", 
          "Chapter number 4 is really annoying.", 
          "Topic4 is discussed in this chapter.")

library(stringr)
library(plyr)

xy <- str_extract_all(text, pattern = "[Tt]opic\\d+")
xy <- sapply(xy, FUN = function(x) matrix(x, nrow = 1))
rbind.fill.matrix(xy) # from plyr

     1        2       
[1,] "topic1" NA      
[2,] "topic2" "topic3"
[3,] NA       NA      
[4,] "Topic4" NA 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从具有不同日期格式的字符串变量中提取 R 中的日期,显示缺乏一般结构/困难模式 - Extracting dates in R, from a string variable with different date formats exhibiting lack of general structure / difficult pattern 提取不同的包装成分[R] - Extracting the Different Pack Components [R] 在R中使用ifelse提取字符串中模式的位置 - Extracting position of pattern in a string using ifelse in R 从 R 中的文本文件中提取模式子字符串 - Extracting pattern substrings from a text file in R 当模式不太清楚时,在R中提取子字符串 - Extracting a substring in R when the pattern is not that clear 正则表达式:提取一个十进制数字,其后为R中的模式 - Regex : extracting a decimal number preceded by a pattern in R 从 R 中的模式中提取单个唯一字符 - Extracting a single unique character from a pattern in R 更改在R中的字符串中多次出现的模式 - Changing a pattern that occurs multiple times in a string in R R:提取由数字后跟模式(或空格和模式)组成的子字符串,而不提取其他数字 - R: Extracting a substring consisting of a number followed by a pattern (or a space and a pattern) without extracting other numbers R从矩阵中提取具有不同值的行 - R extracting rows with different values from matrices
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM