简体   繁体   English

为什么这个正则表达式只适用于字符串结束符号?

[英]Why does this regex only work with an end-of-string symbol?

Given these strings:给定这些字符串:

'module1'

'{ module2 }'

'{ module3 as module4 }'

and needing a regular expression to capture the (sub)strings 'module1', 'module2' & module4', this works:并且需要一个正则表达式来捕获(子)字符串'module1','module2'和module4',这有效:

/({ )?(.*as )?(.+?)( })?$/

Which breaks down to:分解为:

({ )?       // Optional opening parens
(.*as )?    // Optional 'blah as '
(.+?)       // The important bit
( })?$      // Optional closing parens, EOS

Why does it fail to match if the end of string character $ is omitted?如果省略字符串字符$的结尾,为什么会匹配失败?

(Also, I'm aware that the unneeded capture groups can be made into matching groups, but keeping it easier to read...) (另外,我知道不需要的捕获组可以组成匹配组,但要让它更容易阅读......)

The problem问题

Lets play the two regex patterns out:让我们播放两个正则表达式模式:

Pattern 1模式 1

Regex                        : String
/({ )?(.*as )?(.+?)( })?$/   : module1

1. Checks for "{ "    >> but it's optional and doesn't exist so pass.
2. Checks for ".*as " >> but it's optional and doesn't exist so pass.
3. Checks for ".+?"   >> matches any character 1 or more times until the next item it HSA to match (in a non-greedy manor)
4. Checks for " }"    >> but it's optional and doesn't exist so pass.
5.Checks for "$"      >> #3 has to match to this point otherwise there won't be a match!

Regex 2正则表达式 2

Regex                        : String
/({ )?(.*as )?(.+?)( })?/    : module1

1. Checks for "{ "    >> but it's optional and doesn't exist so pass.
2. Checks for ".*as " >> but it's optional and doesn't exist so pass.
3. Checks for ".+?"   >> matches any character 1 or more times until the next item it HAS to match (in a non-greedy manor)
4. Checks for " }"    >> but it's optional and doesn't exist so pass.

The problem is that ".+?" is non-greedy and therefore (as the other terms are all ignored, because they don't exist) it stops matching at the next possible match. i.e. each and every character is a match.

A solution一个解法

This is tricky without knowing what the values for "module" may be (ie letters, spaces, numbers)...在不知道“模块”的值可能是什么(即字母、空格、数字)的情况下,这很棘手......

However something like...然而,像......

(\w+)(?:\s})?$
(                : Start of capturing group
 \w+             : Matches [a-zA-Z0-9_] one or more times
    )            : End of capturing group
     (?:         : Start of non-capturing group
        \s*}     : Matches 0 or more white space characters followed by a "}"
            )?   : End of non-capturing group and make it optional
              $  : Matches the end of the string

...will extract the modules without capturing the surrounding spaces and braces etc. ...将提取模块而不捕获周围的空间和大括号等。

NB注意

This only works if the module is:这仅适用于模块是:

  • The last word in the string字符串中的最后一个单词
  • One word only只有一个字
  • Only made up of letters, numbers, and underscores仅由字母、数字和下划线组成

Example例子

$strings = [
    'module1',
    '{ module2 }',
    '{ module3 as module4 }'
];

foreach($strings as $string){
    preg_match('/(\w+)(?:\s*})?$/', $string, $match);
    var_dump($match[1]);
}

/*
Output

string(7) "module1"
string(7) "module2"
string(7) "module4"

*/

Second example第二个例子

Because I realised this question was asked for JS not PHP!!!!因为我意识到这个问题是针对 JS 而不是 PHP!!!!

var strings = [
    'module1',
    '{ module2 }',
    '{ module3 as module4 }'
];
var pattern = /(\w+)(?:\s*})?$/
for(i = 0; i < strings.length; i++){
    console.log(strings[i].match(pattern)[1]);
}

/*
Output

module1
module2
module4

*/

Without using the $ to assert the end of the string, the minimum match is 1 char due to the.+?不使用$来断言字符串的结尾,由于 .+? 的原因,最小匹配是 1 个字符。 and the rest is optional. rest 是可选的。

There will be no rule that states that the pattern has to fulfil the match until the end of the string, so it can have more matches until all the characters are processed.没有规则规定模式必须在字符串末尾完成匹配,因此在处理完所有字符之前它可以有更多匹配。

When there is a $ present, the .+?当存在$时, .+? can give up matches until it reaches the end of the string, or matches a } if it is present due to it being non greedy.可以放弃匹配直到它到达字符串的末尾,或者匹配一个}如果它存在,因为它是非贪婪的。

Note that using ({ )?请注意,使用({ )? will also match a curly without the closing curly, or the other way around.也将匹配没有结束卷曲的卷曲,或者相反。


What you might do is optionally match until you encounter as between parenthesis and then capture the rest in group 1. Or you capture the whole string in group 2.您可能会做的是选择性匹配,直到您在括号之间遇到as然后在第 1 组中捕获 rest。或者您在第 2 组中捕获整个字符串。

\{(?:[^{}]* as )? *([^{}]*) }|(.+)
  • \{ Match { \{匹配{
  • (?:[^{}]* as )? * (?:[^{}]* as )? * Optionally match the part inside the curly braces until as (?:[^{}]* as )? *可选择匹配花括号内的部分, as
  • ([^{}]*) Capture group 1, match 0+ times any char except { and } ([^{}]*)捕获组 1,匹配除{}之外的任何字符 0+ 次
  • } Match } }匹配}
  • | Or或者
  • (.+) Capture group 2, match the whole line (as least a single character) (.+)捕获组 2,匹配整行(至少一个字符)

Regex demo正则表达式演示

 const regex = /\{(?:[^{}]* as )? *([^{}]*) }|(.+)/g; [ 'module1', '{ module2 }', '{ module 3 as module4 }' ].forEach(s => console.log( Array.from(s.matchAll(regex), m => m[1]?== undefined: m[1]; m[2])));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM