[英]How to say (\w+\W+) times 4 in regex (R gsub)
(In R gsub(),) I need to capture the four words occurring after a particular phrase in a bigger string. (在R gsub()中)我需要捕获更大字符串中特定短语之后出现的四个单词。 Building on the wisdom offered here , I came up with:
^.*\\\\b(particular phrase)\\\\W+(\\\\w+\\\\W+\\\\w+\\\\W+\\\\w+\\\\W+\\\\w+).*$
基于这里提供的智慧,我想出了:
^.*\\\\b(particular phrase)\\\\W+(\\\\w+\\\\W+\\\\w+\\\\W+\\\\w+\\\\W+\\\\w+).*$
For example: 例如:
this_txt <- "Blah blah particular phrase Extract These Words Please for the blah blah. Ignore blah this other stuff blah blah, blah."
this_pattern <- "^.*\\b(particular phrase)\\W+(\\w+\\W+\\w+\\W+\\w+\\W+\\w+).*$"
gsub(this_pattern, "\\2", this_txt, ignore.case = T)
# [1] "Extract These Words Please"
But the repetition of \\\\w+\\\\W+
in the pattern is pretty unseemly. 但是在模式中重复
\\\\w+\\\\W+
是非常不合时宜的。 Surely there is a better way. 当然有更好的方法。 I thought
^.*\\\\b(particular phrase)\\\\W+(\\\\w+\\\\W+){4}.*$
might work, but it doesn't. 我想
^.*\\\\b(particular phrase)\\\\W+(\\\\w+\\\\W+){4}.*$
可能有效,但事实并非如此。
You may use 你可以用
^.*\b(particular phrase)\W+((?:\w+\W+){3}\w+).*$
In R, 在R中,
this_pattern <- "^.*\\b(particular phrase)\\W+((?:\\w+\\W+){3}\\w+).*$"
See the regex demo 请参阅正则表达式演示
(\\w+\\W+\\w+\\W+\\w+\\W+\\w+)
is replaced with ((?:\\w+\\W+){3}\\w+)
. (\\w+\\W+\\w+\\W+\\w+\\W+\\w+)
替换为((?:\\w+\\W+){3}\\w+)
。 The ((?:\\w+\\W+){3}\\w+)
is a capturing group ( (...)
) that contains two subpatterns: ((?:\\w+\\W+){3}\\w+)
是一个包含两个子模式的捕获组 ( (...)
):
(?:\\w+\\W+){3}
- a non-capturing group matching three repetitions of (?:\\w+\\W+){3}
- 一个匹配三次重复的非捕获组
\\w+
- 1 or more word chars \\w+
- 1个或更多单词字符 \\W+
- 1 or more non-word chars \\W+
- 一个或多个非单词字符 \\w+
- 1 or mor word chars. \\w+
- 1或mor词汇。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.