简体   繁体   中英

Regex optional non-capturing groups

i am a total Regex Noob and spent hours trying to solve this puzzle. I think I have to use some kind of optional non-capturing groups or alternation.

I want to match the following strings:

  1. Neuer Film a von 1000

  2. Neuer Film a von 1000 mit b

  3. Neuer Film a von 1000 mit b und c

  4. Neuer Film a von 1000 mit b und c und d

  5. Neuer Film a mit b

  6. Neuer Film a mit b und c

  7. Neuer Film a mit b und c und d

My regex looks like this:

var regex = /(?:[nN]euer [Ff]ilm\s?)(.*)(?:[vV]on).(\d{4}).(?:[Mm]it)(.*)(?:[uU]nd)(.*)/g;

The problem is it matches only string 3 and 4. And it does not match the last two "und", but packs it in group No.3 not in group No.4.

Can someone please help with my Regex (which is not very user friendly at all ;)

You really need to use non-capturing optional groups (like (?:...)? ), but besides, you also need anchors ( ^ to match the start of the string and $ to match the string end) and lazy dot matching patterns ( .*? , to match as few any chars as possible).

You may use

/^[nN]euer [Ff]ilm\s*(.*?)(?:\s*[vV]on\s+(\d{4}))?(?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)?$/

See the regex demo . In the demo, /gm modifiers are necessary since the input is a multiline string.

Pattern details :

  • ^ - start of a string anchor
  • [nN]euer [Ff]ilm - Neuer film / Neuer Film / neuer Film
  • \\s* - zero or more whitespaces
  • (.*?) - Group 1: any 0+ chars other than line break chars, as few as possible (that is, up to the leftmost occurrence of the subsequent subpatterns)
  • (?:\\s*[vV]on\\s+(\\d{4}))? - 1 or 0 occurrences of:
    • \\s* - 0+ whitespaces
    • [vV]on - von or Von
    • \\s+ - 1+ whitespaces
    • (\\d{4}) - Group 2: 4 digits
  • (?:\\s+[Mm]it\\s*(.*?)(?:\\s*[uU]nd\\s*(.*))?)? - an optional non-capturing group matching 1 or 0 occurrences of:
    • \\s+ - 1+ whitespaces
    • [Mm]it - Mit or mit
    • \\s* - 0+ whitespaces
    • (.*?) - Group 3 matching any 0+ chars other than line break chars, as few as possible
    • (?:\\s*[uU]nd\\s*(.*))? - an optional non-capturing group matching
      • \\s*[uU]nd\\s* - und or Und enclosed with 0+ whitespaces
      • (.*) - Group 4 matching any 0+ chars other than line break chars, as many as possible
  • $ - end of string.

 var strs = ['Neuer Film a von 1000','Neuer Film a von 1000 mit b','Neuer Film a von 1000 mit b und c','Neuer Film a von 1000 mit b und c und d','Neuer Film a mit b','Neuer Film a mit b und c','Neuer Film a mit b und c und d']; var rx = /^[nN]euer [Ff]ilm\\s*(.*?)(?:\\s*[vV]on\\s+(\\d{4}))?(?:\\s+[Mm]it\\s*(.*?)(?:\\s*[uU]nd\\s*(.*))?)?$/; for (var s of strs) { var m = rx.exec(s); if (m) { console.log('-- ' + s + ' ---'); console.log('Group 1: ' + m[1]); if (m[2]) console.log('Group 2: ' + m[2]); if (m[3]) console.log('Group 3: ' + m[3]); if (m[4]) console.log('Group 4: ' + m[4]); } } 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM