[英]Regex to select text based on angle brackets having nested angle brackets
I wanted to select text based on below scenarios. 我想根据以下情况选择文本。 I tried couple of regex but still I am not able to cover all the scenarios using one regex.
我尝试了几次正则表达式,但仍然无法使用一个正则表达式涵盖所有情况。
Set 1 套装1
<x> <y>
Result should be two groups <x> and <y> <x> <y>
结果应为两组<x>和<y>
<Name> <NewName>
Result should be two groups <Name> and <NewName> <Name> <NewName>
结果应为两组<名称>和<新 名称 >
Set 2 套装2
sampletext <!PARSE<sampletext>><.value>
Result should be two groups sampletext and <!PARSE<sampletext>><.value> sampletext <!PARSE<sampletext>><.value>
结果应为两组sampletext和<!PARSE <sampletext >> <。value>
found <!PARSE<XYZ.ID>notfound>
Result should be two groups <found> and <!PARSE<XYZ.ID>notfound> found <!PARSE<XYZ.ID>notfound>
结果应该是两个组<found>和<!PARSE <XYZ.ID> notfound>
<XYZ.IDXX> notfound
Result should be two groups <XYZ.IDXX> and notfound <XYZ.IDXX> notfound
结果应该是两组<XYZ.IDXX>和NOTFOUND
notFoundString <!PARSE<XYZ.IDXX>notfound>
Result should be two groups <notFoundString> and <!PARSE<XYZ.IDXX>notfound> notFoundString <!PARSE<XYZ.IDXX>notfound>
结果应为两组<notFoundString>和<!PARSE <XYZ.IDXX> notfound>
notFoundEmpty <!PARSE<XYZ.IDXX>>
Result should be two groups <notFoundEmpty> and <!PARSE<XYZ.IDXX>> notFoundEmpty <!PARSE<XYZ.IDXX>>
结果应为两组<notFoundEmpty>和<!PARSE <XYZ.IDXX >>
Set 3 套装3
<thread.end> <thread.start>
Result should be two groups <thread.end> and <thread.start> <thread.end> <thread.start>
结果应为两组<thread.end>和<thread.start>
<!MINUS <thread.end> <thread.start>> 1000
Result should be two groups <!MINUS <thread.end> <thread.start>> and 1000 <!MINUS <thread.end> <thread.start>> 1000
结果应为两组<!MINUS <thread.end> <thread.start >>和1000
thread.duration <!DIVISION <!MINUS <thread.end> <thread.start>> 1000>
Result should be two groups thread.duration and <!DIVISION <!MINUS <thread.end> <thread.start>> 1000> thread.duration <!DIVISION <!MINUS <thread.end> <thread.start>> 1000>
结果应为两组thread.duration和<!DIVISION <!MINUS <thread.end> <thread.start >> 1000>
Set 4 套装4
1234 5678
Result should be two groups 1234 and 5678 1234 5678
结果应为两组1234和5678
add.sample.result <!ADD 1234 5678>
Result should be two groups add.sample.result and <NewName> add.sample.result <!ADD 1234 5678>
结果应为两组add.sample.result和<NewName>
Regexs I tried 我尝试过的正则表达式
<([^>]*)>|(\\S+)
This works fine in Set 1 and 4, but in Set 2 and 3, it captures more groups than required. <([^>]*)>|(\\S+)
在Set 1和4中工作正常,但在Set 2和3中,它捕获的组多于所需的组。 https://regexr.com/3si0v https://regexr.com/3si0v
<(.*)>|(\\S+)
This works fine for Set 2 and 4, but gives wrong results in Set 1 and 3. https://regexr.com/3si12 <(.*)>|(\\S+)
对于设置2和4可以正常工作,但是在设置1和3中给出错误的结果。https://regexr.com/3si12
I need regex which give expected results as mentioned above in all sets. 我需要正则表达式,它可以在所有集合中提供如上所述的预期结果。
You may use 您可以使用
((?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+)|(\S+)
See the regex demo 见正则表达式演示
It either matches and captures into 2 groups (?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+
or \\S+
patterns. 它要么匹配并捕获为2组
(?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+
或\\S+
模式。
Details 细节
(?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+
- matches 1 or more consecutive occurrences of (?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+
-匹配1个或多个连续出现的
<
- a <
<
-一个<
[^<>]*
- 0+ chars other than <
and >
[^<>]*
-除<
和>
以外的0+个字符 (?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*
- 0+ sequences of (?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*
-0个以上的序列
<[^<>]*(?:<[^<>]*>[^<>]*)*>
- Nested level 1: <[^<>]*(?:<[^<>]*>[^<>]*)*>
-嵌套级别1: <[^<>]*
- <
and 0+ chars other than <
and >
<[^<>]*
- <
和0+比其他字符<
和>
(?:<[^<>]*>[^<>]*)*
- Nested level 2: 0+ sequences of (?:<[^<>]*>[^<>]*)*
-嵌套级别2:0+个
<
- a <
<
-一个<
[^<>]*
- 0+ chars other than <
and >
[^<>]*
-除<
和>
以外的0+个字符 >
- a >
>
-一个>
[^<>]*
- 0+ chars other than <
and >
[^<>]*
-除<
和>
以外的0+个字符 >
- a >
char >
-一个>
字符 [^<>]*
- 0+ chars other than <
and >
[^<>]*
-除<
和>
以外的0+个字符 >
- a >
>
-一个>
|
- or \\S+
- 1+ non-whitespace chars. \\S+
-1+个非空白字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.