简体   繁体   中英

Is it normal for a RegEx to give more than one matching for a single match?

I am sorry if the question title is a bit confusing, but here I will elaborate my confusion in detail.

I want to use regular expression to match apple, orange, mango, apple[(can have any number or empty)], orange[(can have any number or empty)]. (Notice mango will NOT have []). Here are some of the valid examples:

  1. apple MATCHED
  2. orange MATCHED
  3. apple[] MATCHED
  4. orange[] MATCHED
  5. apple[15] MATCHED
  6. apple[05] NOT MATCHED(because a number should not start with 0)
  7. mango[] NOT MATCHED(because mango can't have [])

Here is the regular expression I come up with:

/^(mango|(apple|orange)(\[[1-9][0-9]*\])?)$

This regular expression works, but usually it gives more than 1 matching group. For example apple[15] will give 1. apple[15] 2. apple[15] 3. [15]

Actually the behavior is normal as I have many () which creates many groups, but I wonder if I am using the right way to construct this regular expression? Because it just gives too many results for a single match.

Moreover, is there any way I can optimize this regular expression? This regular expression is fairly straightforward but it seems it is complicated.

Thank you.

It's matching those sub-groups because that's what () does. If you want to group items together without matching them to output, use non-capturing groups (?:) . For example: (?:apple|orange) would match apple or orange, but would not capture the group to output.

If you want to capture the entire match only without subgroups, do the following:

^mango$|^(?:apple|orange)(?:\[(?:[1-9][0-9]*)?\])?$

Regex101

 var strArr = [ 'apple', 'orange', 'apple[]', 'orange[]', 'apple[15]', 'apple[05]', 'mango[]', 'mango' ]; var re = /^mango$|^(?:apple|orange)(?:\\[(?:[1-9][0-9]*)?\\])?$/; strArr.forEach(function(str) { document.body.insertAdjacentHTML('beforeend', str + ' - match? ' + re.test(str) + '<br>'); });

Railroad Diagram:

在此处输入图片说明

In your regular expression you are declaring (G1|(G2)(G3)). This is why when you match you get an array with four values:

1. apple[15] The whole match
2. apple[15] G1 (mango|(apple|orange)(\[1-9][0-9]*\])?)
3. apple G2 (apple|orange)
4. [15] G3 (\[[1-9][0-9]*\])?

If you altered the regular expression to be /^(mango)|(apple|orange)(\\[[1-9][0-9]*\\])?$/ you will get the same result, except #2 from above will be undefined unless you have mango as the input parameter. Note that the expression will still accept mango[123] , but the match will not include the number.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM