[英]Split text based on the exact match from list of patterns using r
I have text and patterns. 我有文字和图案。
text <- "It is only a very poor quality car that can give big problems with automatic gearbox"
patterns <- c("very poor","big problems")
Split text 拆分文字
unlist(strsplit(text, "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE))
Output: 输出:
[1] "It" "is" "only" "a" "very" "poor" "quality" "car" "that" "can"
[11] "give" "big" "problems" "with" "automatic" "gearbox"
What I need is match the list of patterns in the sentence instead of "very" "poor" it become "very poor" same with "big problems". 我需要的是匹配句子中的模式列表而不是“非常”“差”它变得“非常差”与“大问题”相同。
Sample Output: 样本输出:
[1] "It" "is" "only" "a" "very poor" "quality" "car" "that" "can"
[10] "give" "big problems" "with" "automatic" "gearbox"
How should I do this? 我该怎么做?
This is one approach: 这是一种方法:
library(stringr)
text <- "It is only a very poor quality car that can give big problems with automatic gearbox"
patterns <- c("very poor","big problems")
patterns_ns <- setNames(str_replace_all(patterns, " ", "&&"), patterns)
text_ns <- str_replace_all(text, patterns_ns)
text_split <- str_replace_all(unlist(str_split(text_ns, "\\s")), "&&", " ")
text_split
I've assumed that "&&"
is a string that doesn't actually appear in your source text, and that you want to split at white space. 我假设
"&&"
是一个实际上不会出现在源文本中的字符串,并且您希望在空格处分割。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.