简体   繁体   中英

Java RegExp: Finding the correct regular expression

I am struggling with finding the correct regular expression for extracting the strings according to the following criteria:

I have an xml fragment with multiple tags. Each element starts with <ABC_xxxx> and ends with </ABC_xxxx>

The xxxx changes for each element. For example:

 <ABC_A1S1>1234</ABC_A1S1>
 <ABC_uw3ey>1234</ABC_uw3ey>
 <ABC_PD4frfr5>1234</ABC_PD4frfr5>

etc...

The number of x is not fixed!

I want to extract each element, including the tags themselves.

How can I do that?

Thanks.

Assuming that there will be no such elements nested inside each other, try this:

\<ABC(\w+)\>[^\<]+\<\/ABC(\1)\>

Explanation:

  • \\<ABC(\\w+)\\> is the opening tag that starts with ABC the letters after ABC are captured in a group (hence parentheses). We need them later
  • [^\\<]+ is the body of the element which is any character except opening angle bracket
  • <\\/ABC(\\1)\\> is the closing element that starts with ABC and must follow with the exact letters after ABC in the opening tag. \\1 is a reference to the first captured group.

Important Note : XML is not a regular language , therefore Regular Expressions are not capable to parse it. Eg, imagine 2 or more such elements nested inside each other. Use an XML parser to parse XML.

尝试这个:

<ABC_([^>]*)>([^<]*)<\/ABC_([^>]*)>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM