简体   繁体   中英

How to match same element in between a regex substring

So for example inside of the string:

=Hawai=/Cyprus/=Invalid/invalid==i5valid=/I5valid/=i=

How can I make it so that it matches only the substrings that are confined in between either "=" or "/" on both sides and not "=" on one side and "/" on the other. I want to extract only the matches that start with a capital letter and have 3 or more letters in total between the = or / .

I tried (=|/) on the left and right of the group that catches the substring in a group within but that matches the cases when it's = on one side and / on the other.

Keep in mind that I'm still learning regex and I don't know how to make it strictly match on both sides.

You can use

/(?<=([\/=]))[A-Z][a-z]{2,}(?=\1)/g
/([\/=])([A-Z][a-z]{2,})\1/g

See the regex #1 demo and regex #2 demo . Details :

  • (?<=([\/=])) - a positive lookbehind that matches a location that is immediately preceded with / or = (captured into Group 1)
  • [AZ] - an uppercase letter
  • [az]{2,} - two or more lowercase letters
  • (?=\1) - a positive lookahead that matches a location that is immediately followed with the same value as in Group 1.

Note the second regex does not use lookarounds and the main value is captured into Group 2.

See the JavaScript demo below:

 const text = "=Hawai=/Cyprus/=Invalid/invalid==i5valid=/I5valid/=i="; console.log( text.match(/(?<=([\/=]))[AZ][az]{2,}(?=\1)/g) );

Regex #2 test:

 const text = "=Hawai=/Cyprus/=Invalid/invalid==i5valid=/I5valid/=i="; const re = /([\/=])([AZ][az]{2,})\1/g; let matches = [], m; while (m = re.exec(text)) { matches.push(m[2]); } console.log(matches);

 const sampleDate = `=Hawai=/Cyprus/=Invalid/invalid==i5valid=/I5valid/=i=`; // see... [https://regex101.com/r/86JtMp/2] const regXGroups = /([=\/])(?<content>.*?)\1/g; // see... [https://regex101.com/r/86JtMp/1] const regXNamedGroups = /(?<delimiter>[=\/])(?<content>.*?)\k<delimiter>/g; console.log( Array.from( sampleDate.matchAll(regXGroups) ) //.map(({ groups }) => groups?.content).map(([match, delimiter, content]) => content) ); console.log( [...sampleDate.matchAll(regXNamedGroups)] //.map(({ groups }) => groups?.content).map(({ groups: { content } }) => content) );
 .as-console-wrapper { min-height: 100%;important: top; 0; }

OP...

"I want to extract only the matches that start with a capital letter and have 3 or more letters in total between the = or / "

For this, in order to not just limit the code to ASCII / Basic Latin by using character classes like [AZ] / [a-zA-Z] , one could make use of regex unicode property escapes like \p{Lu} for any uppercase letter and \p{L} for any letter... /([=\/])(\p{Lu}\p{L}{2,})\1/gu ... or... /(?<delimiter>[=\/])(?<content>\p{Lu}\p{L}{2,})\k<delimiter>/gu

 const sampleDate = `=Hawai=/Cyprus/=Invalid/invalid==i5valid=/I5valid/=i=`; // see... [https://regex101.com/r/86JtMp/4] const regXGroups = /([=\/])(\p{Lu}\p{L}{2,})\1/gu; // see... [https://regex101.com/r/86JtMp/3] const regXNamedGroups = /(?<delimiter>[=\/])(?<content>\p{Lu}\p{L}{2,})\k<delimiter>/gu; console.log( Array.from( sampleDate.matchAll(regXGroups) ).map(([match, delimiter, content]) => content) ); console.log( [...sampleDate.matchAll(regXNamedGroups)].map(({ groups: { content } }) => content) );
 .as-console-wrapper { min-height: 100%;important: top; 0; }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM