I wanted to select text based on below scenarios. I tried couple of regex but still I am not able to cover all the scenarios using one regex.
Set 1
<x> <y>
Result should be two groups <x> and <y>
<Name> <NewName>
Result should be two groups <Name> and <NewName>
Set 2
sampletext <!PARSE<sampletext>><.value>
Result should be two groups sampletext and <!PARSE<sampletext>><.value>
found <!PARSE<XYZ.ID>notfound>
Result should be two groups <found> and <!PARSE<XYZ.ID>notfound>
<XYZ.IDXX> notfound
Result should be two groups <XYZ.IDXX> and notfound
notFoundString <!PARSE<XYZ.IDXX>notfound>
Result should be two groups <notFoundString> and <!PARSE<XYZ.IDXX>notfound>
notFoundEmpty <!PARSE<XYZ.IDXX>>
Result should be two groups <notFoundEmpty> and <!PARSE<XYZ.IDXX>>
Set 3
<thread.end> <thread.start>
Result should be two groups <thread.end> and <thread.start>
<!MINUS <thread.end> <thread.start>> 1000
Result should be two groups <!MINUS <thread.end> <thread.start>> and 1000
thread.duration <!DIVISION <!MINUS <thread.end> <thread.start>> 1000>
Result should be two groups thread.duration and <!DIVISION <!MINUS <thread.end> <thread.start>> 1000>
Set 4
1234 5678
Result should be two groups 1234 and 5678
add.sample.result <!ADD 1234 5678>
Result should be two groups add.sample.result and <NewName>
Regexs I tried
<([^>]*)>|(\\S+)
This works fine in Set 1 and 4, but in Set 2 and 3, it captures more groups than required. https://regexr.com/3si0v
<(.*)>|(\\S+)
This works fine for Set 2 and 4, but gives wrong results in Set 1 and 3. https://regexr.com/3si12
I need regex which give expected results as mentioned above in all sets.
You may use
((?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+)|(\S+)
See the regex demo
It either matches and captures into 2 groups (?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+
or \\S+
patterns.
Details
(?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+
- matches 1 or more consecutive occurrences of
<
- a <
[^<>]*
- 0+ chars other than <
and >
(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*
- 0+ sequences of
<[^<>]*(?:<[^<>]*>[^<>]*)*>
- Nested level 1: <[^<>]*
- <
and 0+ chars other than <
and >
(?:<[^<>]*>[^<>]*)*
- Nested level 2: 0+ sequences of
<
- a <
[^<>]*
- 0+ chars other than <
and >
>
- a >
[^<>]*
- 0+ chars other than <
and >
>
- a >
char [^<>]*
- 0+ chars other than <
and >
>
- a >
|
- or \\S+
- 1+ non-whitespace chars.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.