简体   繁体   English

这个正则表达式如何与组匹配

[英]How does this regex match into groups

Looking at this ^\\s*(_?)(\\S+?)\\1\\s*$ regular expression from injector.js . injector.js看这个^ \\ s *(_?)(\\ S +?)\\ 1 \\ s * $正则表达式。

I have been able to understand how the string _non_ is matched. 我已经能够理解字符串_non_是如何匹配的。 The first capturing group consists of _ , the second group consists of non and the reference to the result of the first capture group gets you an _ . 第一个捕获组由_组成,第二个组由non组成,第一个捕获组的结果引用为_ So,the first group is _ , the second group is non and the third group is _ . 所以,第一组是_ ,第二组non ,第三组是_

However, I have not been able to understand how the strings _ , _non and __ are matched by the second group given the reference to the \\1 in the expression which would expect an _ at the end given an _ at the beginning. 不过,我一直无法理解怎么串__non__由下式给出的参考,第二组匹配\\1中所期望的表达_在年底给予_开头。

Pattern: ^\\s*(_?)(\\S+?)\\1\\s*$ 模式: ^\\s*(_?)(\\S+?)\\1\\s*$

Overall, this pattern: 总的来说,这种模式:

^ start at the beginning of the string ^从字符串的开头开始

\\s* match 0 or more whitespace chars \\s*匹配0个或更多的空格字符

(_?) match and capture 0 or 1 underscore (capture group 1) (_?)匹配并捕获0或1下划线(捕获组1)

(\\S+?) non-greedy match and capture 1 or more non-whitespace char (capture group 2) (\\S+?)非贪婪匹配并捕获1个或多个非空白字符(捕获组2)

\\1 match for what was matched in capture group 1 \\1匹配捕获组1中匹配的内容

\\s* match 0 or more whitespace chars \\s*匹配0个或更多的空格字符

$ match end of line/string $ match end of line / string

Subject: _ 主题: _

Group 1: 第1组:

Group 2: _ 第2组: _

Initially this will be matched in the first capture group. 最初,这将在第一个捕获组中匹配。 But then the engine moves on to the 2nd capture group and it expects at least one char to match, so the engine backtracks and takes the char from the first capture group because the ? 但随后引擎转移到第二个捕获组并且它期望至少有一个char匹配,因此引擎回溯并从第一个捕获组获取char,因为? in the first capture group makes it optional, and _ is a non-space char. 在第一个捕获组中使它成为可选项, _是非空格字符。 Then, since ultimately nothing was matched in capture group 1 (because group 2 had to be satisfied), there is nothing to match in the \\1 back-reference. 然后,由于捕获组1中最终没有匹配(因为必须满足组2),所以在\\1反向引用中没有任何内容可以匹配。

Subject: _non 主题: _non

Group 1: 第1组:

Group 2: _non 第2组: _non

Initially the _ is matched in group 1, then non is matched in group 2. Then the engine looks for a _ for that \\1 reference, and there is none, so the engine backtracks and matches removes it from group 1 and matches it in group 2. 最初_在组1中匹配,然后在组2中匹配non 。然后引擎为该\\1引用查找_ ,并且没有,因此引擎回溯和匹配将其从组1中删除并在第2组。

Subject: _non_ 主题: _non_

Group 1: _ 第1组: _

Group 2: non 第2组: non

Similar to the previous: Initially the _ is matched in group 1, then non is matched in group 2. Then the engine looks for a _ for that \\1 reference, which it matches, so group 1 keeps its _ and group 2 just has non . 与前一个类似:最初_在组1中匹配,然后在组2中匹配non 。然后引擎为该匹配的那个\\1引用查找_ ,因此组1保持其_和组2只有non

Subject: __ 主题: __

Group 1: 第1组:

Group 2: __ 第2组: __

This is essentially same as the first _ example. 这与第一个_例子基本相同。 Initally the first _ is matched in group 1. Then the 2nd _ is matched in group 2. then \\1 tries to match for another _ since group 1 got one, but there is none. 最初,第一个_在第1组中匹配。然后第二个_在第2组中匹配。然后\\1尝试匹配另一个_因为第1组得到一个,但没有。 But group 2 requires at least 1 char, but can have more, so regex engine backs up and puts group 1's match into group 2. 但是第2组需要至少1个字符,但可以有更多,所以正则表达式引擎备份并将第1组的匹配放入第2组。

Subject: _ _ 主题: _ _

Group 1: 第1组:

Group 2: 第2组:

This results in no match. 这导致不匹配。 The engine starts out putting the first _ into group 1, but then fails at putting the space in group 2. So it backs up and attempts to put the first _ into group 2. Since there's no group 1, there is also no \\1 to match. 引擎开始将第一个_放入组1,但是然后将空间放入组2中失败。因此它会备份并尝试将第一个_放入组2.由于没有组1,所以也没有\\1匹配。 The space is then matched by \\s* but then the match fails on the final _ because the pattern says only spaces before the end of string. 然后空格由\\s*匹配,但是匹配在最终_上失败,因为模式在字符串结尾之前只显示空格。

Sidenote 边注

You asked in a comment: 你在评论中问道:

if it matches the _ for the first group does it have to match an _ in the \\1 .Does \\1 it refer to the expression or the result of the expression? 如果匹配_的第一个组,它必须匹配\\1 .Does \\1中的_它是指表达式还是表达式的结果?

It references the result of the expression (what is actually captured), not the expression itself. 它引用表达式的结果 (实际捕获的内容),而不是表达式本身。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM