简体   繁体   中英

Regex to match characters after the last colon that is not within curly brackets

I need a regex that matches (and lists) all 'modifiers' of a string. Modifiers are individual letters behind the last : in the string. Modifiers can have variables which would be written in curly brackets, eg a{variable} . Variables may contain the character : -- which makes it a bit tricky, because we must look for the last : that is NOT between { and } . This is currently my biggest problem, see Example 6 below.

(If it matters, the target language for this will be javascript.)

I got this working already for the most cases, but got a few edge cases that I can not get to work.

My regex so far is:

/(?!.*:)([a-z](\{.*?\})*)/g

Example 1: Single modifier

something:a should match a - working fine

Example 2: Multiple modifiers

something:abc should match a , b , and c - working fine

Example 3: Single modifier with variable

something:a{something} should match a{something} - working fine

Example 4: Single modifier with multiple variables

something:a{something}{something} should match a{something}{something} - working fine

Example 5: Multiple modifiers with variables

something:ab{something}cd{something}{something}efg should match a , b{something} , c , d{something}{something} , e , f , g - working fine

Example 6: Variable containing :

something:a{something:2} - should match a{something:2} - does NOT work. I probably need to modify the negative lookahead somehow to ignore colons in curly brackets, but I couldn't find out how to do that.

Example 7: String not containing a :

something - should match nothing, but matches each letter individually. This may or may not be easy to fix, but my brain currently can't work this out.

Here is a link to test / play around with this regex and the examples: https://regexr.com/6h4h0

If anyone can help me to figure out how to make the regex work for example 6 and 7, I'd be very grateful!

You can use

 const regex = /.*:((?:[a-zA-Z](?:{[^{}]*})*)+)$/; const extract_rx = /[a-zA-Z](?:{[^{}]*})*/g; const texts = ['something:a','something:abc','something:a{something}','something:a{something}{something}','something:ab{something}cd{something}{something}efg','something:a{something:2}','something:a{something:2}b{something:3}','something']; for (const text of texts) { const m = text.match(regex); if (m) { const matches = m[1].match(extract_rx); console.log(text, '=>', matches); } else { console.log(text, '=> NO MATCH'); } }

See the main regex demo . Details :

  • .*: - matches any zero or more chars other than line break chars as many as possible and then a : followed with...
  • ((?:[a-zA-Z](?:{[^{}]*})*)+) - Group 1: one or more sequences of
    • [a-zA-Z] - an ASCII letter
    • (?:{[^{}]*})* - zero or more sequences of a { , zero or more chars other than { and } and then a } char
  • $ - end of string.

Once there is a match, Group 1 is parsed again to extract all sequences of a letter and then any zero or more {...} substrings right after from it.

What you could do instead is make sure there is a colon somewhere before the matched string with a positive lookbehind.
Essentially switching (?.:*:) for (?<=:.*) .

Playground

 const regex = /(?<=:.*)([az](\{.*?\})*)/g; const strings = [ "something:a", "something:abc", "something:a{something}", "something:a{something}{something}", "something:ab{something}cd{something}{something}efg", "something:a{something:2}", "something", ]; for (const string of strings) { console.log(string.match(regex)); }

Not sure if this is what you want:

:([a-z\{.*?\}0-9])*

I would try longer, but have to go catch a flight.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM