简体   繁体   中英

Regex to capture optional characters

I want to pull out a base string (Wax) or (noWax) from a longer string, along with potentially any data before and after if the string is Wax. I'm having trouble getting the last item in my list below (noWax) to match.

Can anyone flex their regex muscles? I'm fairly new to regex so advice on optimization is welcome as long as all matches below are found.

What I'm working with in Regex101:


/(?<Wax>Wax(?:Only|-?\d+))/mg

Original string need to extract in a capturing group
Loc3_341001_WaxOnly_S212 WaxOnly
Loc4_34412-a_Wax4_S231 Wax4
Loc3a_231121-a_Wax-4-S451 Wax-4
Loc3_34112_noWax_S311 noWax

Here is one way to do so, using a conditional :

(?<Wax>(no)?Wax(?(2)|(?:Only|-?\d+)))

See the online demo .


  • (no)? : Optional capture group.
  • (? If.
    • (2) : Test if capture group 2 exists ( (no) ). If it does, do nothing.
    • | : Or.
    • (?:Only|-?\d+)

I assume the following match is desired.

  • the match must include 'Wax'
  • 'Wax' is to be preceded by '_' or by '_no' . If the latter 'no' is included in the match.
  • 'Wax' may be followed by:
    • 'Only' followed by '_' , in which case 'Only' is part of the match, or
    • one or more digits, followed by '_' , in which case the digits are part of the match, or
    • '-' followed by one or more digits, followed by '-' , in which case '-' followed by one or more digits is part of the match.

If these assumptions are correct the string can be matched against the following regular expression:

(?<=_)(?:(?:no)?Wax(?:(?:Only|\d+)?(?=_)|\-\d+(?=-)))

Demo

The regular expression can be broken down as follows.

(?<=_)            # positive lookbehind asserts previous character is '_'
(?:               # begin non-capture group
  (?:no)?         # optionally match 'no'
  Wax             # match literal
  (?:             # begin non-capture group
    (?:Only|\d+)? # optionally match 'Only' or >=1 digits
    (?=_)         # positive lookahead asserts next character is '_'
    |             # or
    \-\d+         # match '-' followed by >= 1 digits
    (?=-)         # positive lookahead asserts next character is '-'
  )               # end non-capture group
)                 # end non-capture group

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM