怎么说（\\ w + \\ W +）乘以正则表达式4（R gsub）

Question

(In R gsub(),) I need to capture the four words occurring after a particular phrase in a bigger string. （在R gsub（）中）我需要捕获更大字符串中特定短语之后出现的四个单词。 Building on the wisdom offered here , I came up with: ^.*\\\\b(particular phrase)\\\\W+(\\\\w+\\\\W+\\\\w+\\\\W+\\\\w+\\\\W+\\\\w+).*$ 基于这里提供的智慧，我想出了： ^.*\\\\b(particular phrase)\\\\W+(\\\\w+\\\\W+\\\\w+\\\\W+\\\\w+\\\\W+\\\\w+).*$

For example: 例如：

this_txt <- "Blah blah particular phrase Extract These Words Please for the blah blah. Ignore blah this other stuff blah blah, blah."
this_pattern <- "^.*\\b(particular phrase)\\W+(\\w+\\W+\\w+\\W+\\w+\\W+\\w+).*$"
gsub(this_pattern, "\\2", this_txt, ignore.case = T)
# [1] "Extract These Words Please"

But the repetition of \\\\w+\\\\W+ in the pattern is pretty unseemly. 但是在模式中重复\\\\w+\\\\W+是非常不合时宜的。 Surely there is a better way. 当然有更好的方法。 I thought ^.*\\\\b(particular phrase)\\\\W+(\\\\w+\\\\W+){4}.*$ might work, but it doesn't. 我想^.*\\\\b(particular phrase)\\\\W+(\\\\w+\\\\W+){4}.*$可能有效，但事实并非如此。

Answer 1

You may use 你可以用

^.*\b(particular phrase)\W+((?:\w+\W+){3}\w+).*$

In R, 在R中，

this_pattern <- "^.*\\b(particular phrase)\\W+((?:\\w+\\W+){3}\\w+).*$"

See the regex demo 请参阅正则表达式演示

(\\w+\\W+\\w+\\W+\\w+\\W+\\w+) is replaced with ((?:\\w+\\W+){3}\\w+) . (\\w+\\W+\\w+\\W+\\w+\\W+\\w+)替换为((?:\\w+\\W+){3}\\w+) 。 The ((?:\\w+\\W+){3}\\w+) is a capturing group ( (...) ) that contains two subpatterns: ((?:\\w+\\W+){3}\\w+)是一个包含两个子模式的捕获组（ (...) ）：

(?:\\w+\\W+){3} - a non-capturing group matching three repetitions of (?:\\w+\\W+){3} - 一个匹配三次重复的非捕获组
- \\w+ - 1 or more word chars \\w+ - 1个或更多单词字符
- \\W+ - 1 or more non-word chars \\W+ - 一个或多个非单词字符
\\w+ - 1 or mor word chars. \\w+ - 1或mor词汇。

怎么说（\\ w + \\ W +）乘以正则表达式4（R gsub）

问题描述

1 个解决方案

解决方案1
3 已采纳 2019-06-12 16:36:04

怎么说（\\ w + \\ W +）乘以正则表达式4（R gsub）

问题描述

1 个解决方案

解决方案1 3 已采纳 2019-06-12 16:36:04

解决方案1
3 已采纳 2019-06-12 16:36:04