[英]R: extracting pattern, different times
I've the following problem: I've a text, separated by chapters and stored by a vector. 我有以下问题:我有一个文本,由各章分隔并由矢量存储。 Suppose something like:
假设类似:
text <- c("Here are information about topic1.",
"Here are some information about topic2 or topic3.",
"Chapter number 4 is really annoying.",
"Topic4 is discussed in this chapter.")
And I want to extract the different topics mentioned in the different chapters. 我想提取不同章节中提到的不同主题。 So my output should be something like:
所以我的输出应该是这样的:
output
[1] [2]
[1] "topic1"
[2] "topic2" "topic3"
[3]
[4] "topic3"
So I have some rows with multiple findings and some with no match. 所以我有些行有多个发现,有些行没有匹配。
I tried things with str_extract_all and unlist the list, but got problems causing the different number of row elements. 我尝试使用str_extract_all进行操作并取消列出列表,但是遇到了导致行元素数量不同的问题。
Thanks to all! 谢谢大家!
You can use rbind.fill.matrix
from plyr
. 您可以使用
rbind.fill.matrix
的plyr
。
text <- c("Here are information about topic1.",
"Here are some information about topic2 or topic3.",
"Chapter number 4 is really annoying.",
"Topic4 is discussed in this chapter.")
library(stringr)
library(plyr)
xy <- str_extract_all(text, pattern = "[Tt]opic\\d+")
xy <- sapply(xy, FUN = function(x) matrix(x, nrow = 1))
rbind.fill.matrix(xy) # from plyr
1 2
[1,] "topic1" NA
[2,] "topic2" "topic3"
[3,] NA NA
[4,] "Topic4" NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.