使用r基于模式列表的完全匹配来拆分文本

Question

I have text and patterns. 我有文字和图案。

text <- "It is only a very poor quality car that can give big problems with automatic gearbox" 
patterns <- c("very poor","big problems")

Split text 拆分文字

unlist(strsplit(text, "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE))

Output: 输出：

[1] "It"        "is"        "only"      "a"         "very"      "poor"      "quality"   "car"       "that"      "can"      
[11] "give"      "big"       "problems"  "with"      "automatic" "gearbox"

What I need is match the list of patterns in the sentence instead of "very" "poor" it become "very poor" same with "big problems". 我需要的是匹配句子中的模式列表而不是“非常”“差”它变得“非常差”与“大问题”相同。

Sample Output: 样本输出：

[1] "It"     "is"     "only"    "a"    "very poor"   "quality"   "car"  "that"   "can"      
[10] "give"   "big problems"  "with"   "automatic"   "gearbox"

How should I do this? 我该怎么做？

Answer 1

This is one approach: 这是一种方法：

library(stringr)
text <- "It is only a very poor quality car that can give big problems with automatic gearbox" 
patterns <- c("very poor","big problems")
patterns_ns <- setNames(str_replace_all(patterns, " ", "&&"), patterns)
text_ns <- str_replace_all(text, patterns_ns)
text_split <- str_replace_all(unlist(str_split(text_ns, "\\s")), "&&", " ")
text_split

I've assumed that "&&" is a string that doesn't actually appear in your source text, and that you want to split at white space. 我假设"&&"是一个实际上不会出现在源文本中的字符串，并且您希望在空格处分割。

使用r基于模式列表的完全匹配来拆分文本

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-02-20 21:20:05

使用r基于模式列表的完全匹配来拆分文本

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-02-20 21:20:05

解决方案1
2 已采纳 2019-02-20 21:20:05