简体   繁体   中英

Regex to select text based on angle brackets having nested angle brackets

I wanted to select text based on below scenarios. I tried couple of regex but still I am not able to cover all the scenarios using one regex.

Set 1

<x> <y> Result should be two groups <x> and <y>

<Name> <NewName> Result should be two groups <Name> and <NewName>

Set 2

sampletext <!PARSE<sampletext>><.value> Result should be two groups sampletext and <!PARSE<sampletext>><.value>

found <!PARSE<XYZ.ID>notfound> Result should be two groups <found> and <!PARSE<XYZ.ID>notfound>

<XYZ.IDXX> notfound Result should be two groups <XYZ.IDXX> and notfound

notFoundString <!PARSE<XYZ.IDXX>notfound> Result should be two groups <notFoundString> and <!PARSE<XYZ.IDXX>notfound>

notFoundEmpty <!PARSE<XYZ.IDXX>> Result should be two groups <notFoundEmpty> and <!PARSE<XYZ.IDXX>>

Set 3

<thread.end> <thread.start> Result should be two groups <thread.end> and <thread.start>

<!MINUS <thread.end> <thread.start>> 1000 Result should be two groups <!MINUS <thread.end> <thread.start>> and 1000

thread.duration <!DIVISION <!MINUS <thread.end> <thread.start>> 1000> Result should be two groups thread.duration and <!DIVISION <!MINUS <thread.end> <thread.start>> 1000>

Set 4

1234 5678 Result should be two groups 1234 and 5678

add.sample.result <!ADD 1234 5678> Result should be two groups add.sample.result and <NewName>

Regexs I tried

  1. <([^>]*)>|(\\S+) This works fine in Set 1 and 4, but in Set 2 and 3, it captures more groups than required. https://regexr.com/3si0v

  2. <(.*)>|(\\S+) This works fine for Set 2 and 4, but gives wrong results in Set 1 and 3. https://regexr.com/3si12

I need regex which give expected results as mentioned above in all sets.

You may use

((?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+)|(\S+)

See the regex demo

It either matches and captures into 2 groups (?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+ or \\S+ patterns.

Details

  • (?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+ - matches 1 or more consecutive occurrences of
    • < - a <
    • [^<>]* - 0+ chars other than < and >
    • (?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)* - 0+ sequences of
      • <[^<>]*(?:<[^<>]*>[^<>]*)*> - Nested level 1:
      • <[^<>]* - < and 0+ chars other than < and >
      • (?:<[^<>]*>[^<>]*)* - Nested level 2: 0+ sequences of
        • < - a <
        • [^<>]* - 0+ chars other than < and >
        • > - a >
        • [^<>]* - 0+ chars other than < and >
      • > - a > char
      • [^<>]* - 0+ chars other than < and >
    • > - a >
  • | - or
  • \\S+ - 1+ non-whitespace chars.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM