简体   繁体   English

怎么说(\\ w + \\ W +)乘以正则表达式4(R gsub)

[英]How to say (\w+\W+) times 4 in regex (R gsub)

(In R gsub(),) I need to capture the four words occurring after a particular phrase in a bigger string. (在R gsub()中)我需要捕获更大字符串中特定短语之后出现的四个单词。 Building on the wisdom offered here , I came up with: ^.*\\\\b(particular phrase)\\\\W+(\\\\w+\\\\W+\\\\w+\\\\W+\\\\w+\\\\W+\\\\w+).*$ 基于这里提供的智慧,我想出了: ^.*\\\\b(particular phrase)\\\\W+(\\\\w+\\\\W+\\\\w+\\\\W+\\\\w+\\\\W+\\\\w+).*$

For example: 例如:

this_txt <- "Blah blah particular phrase Extract These Words Please for the blah blah. Ignore blah this other stuff blah blah, blah."
this_pattern <- "^.*\\b(particular phrase)\\W+(\\w+\\W+\\w+\\W+\\w+\\W+\\w+).*$"
gsub(this_pattern, "\\2", this_txt, ignore.case = T)
# [1] "Extract These Words Please"

But the repetition of \\\\w+\\\\W+ in the pattern is pretty unseemly. 但是在模式中重复\\\\w+\\\\W+是非常不合时宜的。 Surely there is a better way. 当然有更好的方法。 I thought ^.*\\\\b(particular phrase)\\\\W+(\\\\w+\\\\W+){4}.*$ might work, but it doesn't. 我想^.*\\\\b(particular phrase)\\\\W+(\\\\w+\\\\W+){4}.*$可能有效,但事实并非如此。

You may use 你可以用

^.*\b(particular phrase)\W+((?:\w+\W+){3}\w+).*$

In R, 在R中,

this_pattern <- "^.*\\b(particular phrase)\\W+((?:\\w+\\W+){3}\\w+).*$"

See the regex demo 请参阅正则表达式演示

(\\w+\\W+\\w+\\W+\\w+\\W+\\w+) is replaced with ((?:\\w+\\W+){3}\\w+) . (\\w+\\W+\\w+\\W+\\w+\\W+\\w+)替换为((?:\\w+\\W+){3}\\w+) The ((?:\\w+\\W+){3}\\w+) is a capturing group ( (...) ) that contains two subpatterns: ((?:\\w+\\W+){3}\\w+)是一个包含两个子模式的捕获组(...) ):

  • (?:\\w+\\W+){3} - a non-capturing group matching three repetitions of (?:\\w+\\W+){3} - 一个匹配三次重复的非捕获组
    • \\w+ - 1 or more word chars \\w+ - 1个或更多单词字符
    • \\W+ - 1 or more non-word chars \\W+ - 一个或多个非单词字符
  • \\w+ - 1 or mor word chars. \\w+ - 1或mor词汇。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM