简体   繁体   中英

Regex with optional capture fields

I'm trying to use regex to capture numbers from a string in javascript using regex. I've built a string that captures only the numbers when all fields are present:

Target string: 3 Adults, 2 Children, 1 Infant

Regex pattern: ([1-9])(?:.Adults?.*)([1-9])(?:.Child.*)([1-9])(?:.Infant.*)

What I'm looking to capture: [3,2,1]

However in the target string, only the adults part is consistently present in the string, children and infants may not be.

For the target string, I'd like to be able to handle:

3 Adults, 1 Infant

Returning: [3,0,1] or alternatively [3,,1]

3 Adults

Returning [3,0,0] , or alternatively [3]

1 Adult, 1 Child, 2 Infants

Returning [1,1,2]

I've tried wrapping the Children and Infant sections in it's own group to try and make it optional:

([1-9])(?:.Adults?.*)(([1-9])(?:.Child.*))?(([1-9])(?:.Infant.*))?

But in that case it seems to match nothing against any target string.

Is what I'm trying to do possible? Can Regex can return a placeholder value or null value if it doesn't match so that the Infants count isn't moved forward into the Child position of the returned values if there are no children present?

I've created a regex101 page with test strings, but I don't seem to be making much progress: https://regex101.com/r/8NSYMc/1

Any help appreciated!

You may use

(\d+)\s+Adults?(?:,\s*(\d+)\s+Child(?:ren)?)?(?:,\s*(\d+)\s+Infant)?

See the regex demo

Details :

  • (\\d+) - Group 1: one or more digits
  • \\s+Adults? - 1+ whitespaces, Adult and an optional s
  • (?:,\\s*(\\d+)\\s+Child(?:ren)?)? - an optional non-capturing group matching a sequence of:
    • ,\\s* - a comma and 0+ whitespaces
    • (\\d+) - Group 2: one or more digits
    • \\s+Child(?:ren)? - 1+ whitespaces, Child and an optional ren substring
  • (?:,\\s*(\\d+)\\s+Infant)? - an optional non-capturing group matching a sequence of:
    • ,\\s* - a comma and 0+ whitespaces
    • (\\d+) - Group 3: one or more digits
    • \\s+Infant - 1+ whitespaces and Infant substring.

Wiktor has already provided an answer, but I'll explain the problems of your original regex.

First it's important to know that matches are by default greedy, that is, the regex tries to match as much as possible. The .* construct is therefore somewhat dangerous, since it may swallow much more than anticipated.

Since the children and infant groups were mandatory in the beginning, that would limit the text that the greedy match could swallow while still matching all parts. However, after you make the parts optional, the greediness of the adults part would consume the rest, and the two other parts would no longer match.

That's why the solution from Wiktor uses explicit text and not the match-all . . Also, in order to avoid shifting the matches in the result, you should make the optional groups non-capturing, eg start with (?: .

As for the placeholder: no you cannot, but in JS it's pretty easy to deal with that since you can do Number(match[1] || 0) for instance to parse with with a default of 0 for missing values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM