简体   繁体   English

RegEx 为单个匹配项提供多个匹配项是否正常?

[英]Is it normal for a RegEx to give more than one matching for a single match?

I am sorry if the question title is a bit confusing, but here I will elaborate my confusion in detail.如果问题标题有点混乱,我很抱歉,但在这里我将详细阐述我的困惑。

I want to use regular expression to match apple, orange, mango, apple[(can have any number or empty)], orange[(can have any number or empty)].我想用正则表达式来匹配apple、orange、mango、apple[(可以有任何数字或空)]、orange[(可以有任何数字或空)]。 (Notice mango will NOT have []). (注意芒果不会有 [])。 Here are some of the valid examples:以下是一些有效的例子:

  1. apple MATCHED苹果配对
  2. orange MATCHED橙色匹配
  3. apple[] MATCHED苹果[]匹配
  4. orange[] MATCHED橙色[] 匹配
  5. apple[15] MATCHED苹果[15] 匹配
  6. apple[05] NOT MATCHED(because a number should not start with 0) apple[05] NOT MATCHED(因为数字不应该以0开头)
  7. mango[] NOT MATCHED(because mango can't have [])芒果[]不匹配(因为芒果不能有[])

Here is the regular expression I come up with:这是我想出的正则表达式:

/^(mango|(apple|orange)(\[[1-9][0-9]*\])?)$

This regular expression works, but usually it gives more than 1 matching group.此正则表达式有效,但通常会提供 1 个以上的匹配组。 For example apple[15] will give 1. apple[15] 2. apple[15] 3. [15]例如apple[15]将给出 1. apple[15] 2. apple[15] 3. [15]

Actually the behavior is normal as I have many () which creates many groups, but I wonder if I am using the right way to construct this regular expression?实际上行为是正常的,因为我有 many () ,它创建了很多组,但我想知道我是否使用正确的方法来构造这个正则表达式? Because it just gives too many results for a single match.因为它只是为单场比赛提供了太多结果。

Moreover, is there any way I can optimize this regular expression?此外,有什么方法可以优化这个正则表达式吗? This regular expression is fairly straightforward but it seems it is complicated.这个正则表达式相当简单,但看起来很复杂。

Thank you.谢谢你。

It's matching those sub-groups because that's what () does.它匹配那些子组,因为这就是()所做的。 If you want to group items together without matching them to output, use non-capturing groups (?:) .如果要将项目组合在一起而不将它们与输出匹配,请使用非捕获组(?:) For example: (?:apple|orange) would match apple or orange, but would not capture the group to output.例如: (?:apple|orange)将匹配 apple 或 orange,但不会捕获要输出的组。

If you want to capture the entire match only without subgroups, do the following:如果您只想在没有子组的情况下捕获整个匹配项,请执行以下操作:

^mango$|^(?:apple|orange)(?:\[(?:[1-9][0-9]*)?\])?$

Regex101正则表达式101

 var strArr = [ 'apple', 'orange', 'apple[]', 'orange[]', 'apple[15]', 'apple[05]', 'mango[]', 'mango' ]; var re = /^mango$|^(?:apple|orange)(?:\\[(?:[1-9][0-9]*)?\\])?$/; strArr.forEach(function(str) { document.body.insertAdjacentHTML('beforeend', str + ' - match? ' + re.test(str) + '<br>'); });

Railroad Diagram:铁路图:

在此处输入图片说明

In your regular expression you are declaring (G1|(G2)(G3)).在您的正则表达式中,您要声明 (G1|(G2)(G3))。 This is why when you match you get an array with four values:这就是为什么当你匹配时你会得到一个包含四个值的数组:

1. apple[15] The whole match
2. apple[15] G1 (mango|(apple|orange)(\[1-9][0-9]*\])?)
3. apple G2 (apple|orange)
4. [15] G3 (\[[1-9][0-9]*\])?

If you altered the regular expression to be /^(mango)|(apple|orange)(\\[[1-9][0-9]*\\])?$/ you will get the same result, except #2 from above will be undefined unless you have mango as the input parameter.如果您将正则表达式更改为/^(mango)|(apple|orange)(\\[[1-9][0-9]*\\])?$/您将得到相同的结果,除了 #2 from除非您将mango作为输入参数,否则上述内容将是未定义的。 Note that the expression will still accept mango[123] , but the match will not include the number.请注意,表达式仍将接受mango[123] ,但匹配项将不包括数字。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM