Ruby正则表达式多次重复捕获

Question

I'm trying to parse a subset of a webpage with regex for just fun. 我正在尝试使用正则表达式解析网页的一部分，只是为了好玩。 It was fun till I encountered with the following problem. 直到遇到以下问题，这才很有趣。 I have a paragraph like below; 我有一段如下：

foo: 1, 2, 3, 4 and 5.
bar: 1, 2 and 3.

What I am trying to do is, get the numbers in the first line of the paragraph starting with foo: by applying following regex: 我想做的是，通过应用以下正则表达式，获取以foo:开头的段落第一行中的数字：

foo:(?:\s(\d)(?:,|\sand|\.))+

This matches with the above string but it captures only the last occurrence of the capture group which is 5 . 这与上面的字符串匹配，但是它仅捕获捕获组的最后一次出现，即5 。

How can I capture all the numbers in a paragraph starting with foo: till the first occurrence of . 如何捕获以foo:开头的段落中的所有数字，直到第一次出现. using single regex pattern. 使用单个正则表达式模式。

Answer 1

Repeating capturing group's data aren't stored separately in most programming languages, hence you can't refer to them individually. 在大多数编程语言中，重复捕获组的数据并不是单独存储的，因此不能单独引用它们。 This is a valid reason to use \\G anchor. 这是使用\\G锚的正当理由。 \\G causes a match to start from where previous match ended or it will match beginning of string as same as \\A . \\G使匹配从先前的匹配结束处开始，否则它将与\\A相同，匹配字符串的开头。

So we are in need of its first capability: 因此，我们需要它的第一个功能：

(?:foo:|\G(?!\A))\s*(\d+)\s*(?:,|and)?

Breakdown: 分解：

(?: Start a non-capturing group (?:启动一个非捕获组
- foo: Match foo: foo:匹配foo:
- | Or 要么
- \\G(?!\\A) Continue match from where previous match ends \\G(?!\\A)从上一场比赛结束的地方继续比赛
) End of NCG ) NCG结束
\\s* Any number of whitespace characters \\s*任意数量的空格字符
(\\d+) Match and capture digits (\\d+)匹配并捕获数字
\\s* Any number of whitespae characters \\s*任意数量的空白字符
(?:,|and)? Optional , or and 可选的,或and

This regex will begin a match on meeting foo in input string. 这个正则表达式将在输入字符串中与foo相遇开始匹配。 Then tries to find a following digit that precedes a comma or and (whitespaces are allowed around digits). 然后尝试查找逗号或and之前的以下数字（数字周围允许有空格）。

\\K token will reset match. \\K令牌将重置匹配项。 It means it will send a signal to engine to forget whatever is matched so far (but keep whatever is captured) and then leaves cursor right at that position. 这意味着它将向引擎发送信号以忘记到目前为止匹配的任何内容（但是保留捕获的所有内容），然后将光标留在该位置。

I used \\K in Rubular regex to make result set not to have matched strings but captured digits. 我在Rubular正则表达式中使用\\K来使结果集没有匹配的字符串，但捕获了数字。 However Rubular seems to work differently and didn't need \\K . 但是Rubular似乎工作方式不同，不需要\\K It's not a must at all. 这不是必须的。

Answer 2

This answer uses just one regex, but admittedly does a bit of pre- and post-processing. 这个答案仅使用一个正则表达式，但可以接受的是一些预处理和后处理。 (Please allow me a bit of fun. I do think there may be some instructional value here.) （请给我一点乐趣。我确实认为这里可能有一定的指导意义。）

str = "foo: 1, 2, 34, 4 and 5. and 6."

r = /
    \d+             # match one or more digits
    (?=[^.]+:oof\z) # match one or more digits other than a period, followed
                    # by ":oof" at the end of the string, in a positive lookahead
    /x              # free-spacing regex definition mode

str.reverse.scan(r).join(' ').reverse.split
  #=> ["1", "2", "34", "4", "5"]

The steps are as follows. 步骤如下。

s = str.reverse
  #=> ".6 dna .5 dna 4 ,43 ,2 ,1 :oof"
a  = s.scan r
  #=> ["5", "4", "43", "2", "1"]
b  = a.join(' ')
  #=> "5 4 43 2 1"
c  = b.reverse
  #=> "1 2 34 4 5"
c.split
  #=> ["1", "2", "34", "4", "5"]

An empty array is returned if there is no match. 如果没有匹配项，则返回一个空数组。

So, why all the reversing? 那么，为什么全部倒车呢？ It's to allow me to use a positive lookahead , which, unlike a positive lookbehind , permits variable-length matches. 这是为了允许我使用正向前行 ，与正向后行不同，该行允许可变长度的匹配。

Ruby正则表达式多次重复捕获

问题描述

2 个解决方案

解决方案1
3 已采纳 2018-03-11 20:11:16

解决方案2
-1 2018-03-11 08:06:28

Ruby正则表达式多次重复捕获

问题描述

2 个解决方案

解决方案1 3 已采纳 2018-03-11 20:11:16

解决方案2 -1 2018-03-11 08:06:28

解决方案1
3 已采纳 2018-03-11 20:11:16

解决方案2
-1 2018-03-11 08:06:28