简体   繁体   English

正则表达式基于具有嵌套尖括号的尖括号选择文本

[英]Regex to select text based on angle brackets having nested angle brackets

I wanted to select text based on below scenarios. 我想根据以下情况选择文本。 I tried couple of regex but still I am not able to cover all the scenarios using one regex. 我尝试了几次正则表达式,但仍然无法使用一个正则表达式涵盖所有情况。

Set 1 套装1

<x> <y> Result should be two groups <x> and <y> <x> <y>结果应为两组<x><y>

<Name> <NewName> Result should be two groups <Name> and <NewName> <Name> <NewName>结果应为两组<名称><新 名称 >

Set 2 套装2

sampletext <!PARSE<sampletext>><.value> Result should be two groups sampletext and <!PARSE<sampletext>><.value> sampletext <!PARSE<sampletext>><.value>结果应为两组sampletext<!PARSE <sampletext >> <。value>

found <!PARSE<XYZ.ID>notfound> Result should be two groups <found> and <!PARSE<XYZ.ID>notfound> found <!PARSE<XYZ.ID>notfound>结果应该是两个组<found><!PARSE <XYZ.ID> notfound>

<XYZ.IDXX> notfound Result should be two groups <XYZ.IDXX> and notfound <XYZ.IDXX> notfound结果应该是两组<XYZ.IDXX>NOTFOUND

notFoundString <!PARSE<XYZ.IDXX>notfound> Result should be two groups <notFoundString> and <!PARSE<XYZ.IDXX>notfound> notFoundString <!PARSE<XYZ.IDXX>notfound>结果应为两组<notFoundString><!PARSE <XYZ.IDXX> notfound>

notFoundEmpty <!PARSE<XYZ.IDXX>> Result should be two groups <notFoundEmpty> and <!PARSE<XYZ.IDXX>> notFoundEmpty <!PARSE<XYZ.IDXX>>结果应为两组<notFoundEmpty><!PARSE <XYZ.IDXX >>

Set 3 套装3

<thread.end> <thread.start> Result should be two groups <thread.end> and <thread.start> <thread.end> <thread.start>结果应为两组<thread.end><thread.start>

<!MINUS <thread.end> <thread.start>> 1000 Result should be two groups <!MINUS <thread.end> <thread.start>> and 1000 <!MINUS <thread.end> <thread.start>> 1000结果应为两组<!MINUS <thread.end> <thread.start >>1000

thread.duration <!DIVISION <!MINUS <thread.end> <thread.start>> 1000> Result should be two groups thread.duration and <!DIVISION <!MINUS <thread.end> <thread.start>> 1000> thread.duration <!DIVISION <!MINUS <thread.end> <thread.start>> 1000>结果应为两组thread.duration<!DIVISION <!MINUS <thread.end> <thread.start >> 1000>

Set 4 套装4

1234 5678 Result should be two groups 1234 and 5678 1234 5678结果应为两组12345678

add.sample.result <!ADD 1234 5678> Result should be two groups add.sample.result and <NewName> add.sample.result <!ADD 1234 5678>结果应为两组add.sample.result<NewName>

Regexs I tried 我尝试过的正则表达式

  1. <([^>]*)>|(\\S+) This works fine in Set 1 and 4, but in Set 2 and 3, it captures more groups than required. <([^>]*)>|(\\S+)在Set 1和4中工作正常,但在Set 2和3中,它捕获的组多于所需的组。 https://regexr.com/3si0v https://regexr.com/3si0v

  2. <(.*)>|(\\S+) This works fine for Set 2 and 4, but gives wrong results in Set 1 and 3. https://regexr.com/3si12 <(.*)>|(\\S+)对于设置2和4可以正常工作,但是在设置1和3中给出错误的结果。https://regexr.com/3si12

I need regex which give expected results as mentioned above in all sets. 我需要正则表达式,它可以在所有集合中提供如上所述的预期结果。

You may use 您可以使用

((?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+)|(\S+)

See the regex demo 正则表达式演示

It either matches and captures into 2 groups (?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+ or \\S+ patterns. 它要么匹配并捕获为2组(?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+\\S+模式。

Details 细节

  • (?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+ - matches 1 or more consecutive occurrences of (?:<[^<>]*(?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)*>)+ -匹配1个或多个连续出现的
    • < - a < < -一个<
    • [^<>]* - 0+ chars other than < and > [^<>]* -除<>以外的0+个字符
    • (?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)* - 0+ sequences of (?:<[^<>]*(?:<[^<>]*>[^<>]*)*>[^<>]*)* -0个以上的序列
      • <[^<>]*(?:<[^<>]*>[^<>]*)*> - Nested level 1: <[^<>]*(?:<[^<>]*>[^<>]*)*> -嵌套级别1:
      • <[^<>]* - < and 0+ chars other than < and > <[^<>]* - <和0+比其他字符<>
      • (?:<[^<>]*>[^<>]*)* - Nested level 2: 0+ sequences of (?:<[^<>]*>[^<>]*)* -嵌套级别2:0+个
        • < - a < < -一个<
        • [^<>]* - 0+ chars other than < and > [^<>]* -除<>以外的0+个字符
        • > - a > > -一个>
        • [^<>]* - 0+ chars other than < and > [^<>]* -除<>以外的0+个字符
      • > - a > char > -一个>字符
      • [^<>]* - 0+ chars other than < and > [^<>]* -除<>以外的0+个字符
    • > - a > > -一个>
  • | - or - 要么
  • \\S+ - 1+ non-whitespace chars. \\S+ -1+个非空白字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM