I am struggling with finding the correct regular expression for extracting the strings according to the following criteria:
I have an xml fragment with multiple tags. Each element starts with <ABC_xxxx>
and ends with </ABC_xxxx>
The xxxx changes for each element. For example:
<ABC_A1S1>1234</ABC_A1S1>
<ABC_uw3ey>1234</ABC_uw3ey>
<ABC_PD4frfr5>1234</ABC_PD4frfr5>
etc...
The number of x is not fixed!
I want to extract each element, including the tags themselves.
How can I do that?
Thanks.
Assuming that there will be no such elements nested inside each other, try this:
\<ABC(\w+)\>[^\<]+\<\/ABC(\1)\>
Explanation:
\\<ABC(\\w+)\\>
is the opening tag that starts with ABC
the letters after ABC
are captured in a group (hence parentheses). We need them later[^\\<]+
is the body of the element which is any character except opening angle bracket <\\/ABC(\\1)\\>
is the closing element that starts with ABC
and must follow with the exact letters after ABC
in the opening tag. \\1
is a reference to the first captured group. Important Note : XML is not a regular language , therefore Regular Expressions are not capable to parse it. Eg, imagine 2 or more such elements nested inside each other. Use an XML parser to parse XML.
尝试这个:
<ABC_([^>]*)>([^<]*)<\/ABC_([^>]*)>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.