Is there a way to get the string starting from <Detail>
and ending at the first occurrence of <Detail>
string using regex?
<Details>
<Detail>
<Name>Donald</Name>
<Age>10</Age>
</Detail>
<Detail>
<Name>Donald2</Name>
<Age>102</Age>
</Detail>
</Details>
<Detail>
<Name>Donald</Name>
<Age>10</Age>
</Detail>
<Detail>
Assuming you're using Perl (or a compatible regex engine):
m{
<Detail> # match <Detail>
.*? # ... followed by 0 or more of any character, as few as possible
<Detail> # ... followed by another <Detail>
}xs
The s
flag makes .
match any character (including newline).
Here's a regex that might work, with a bit of added flexibility:
<(Detail)>[\\s\\S]*?<\\1>
How's it work?
< > < > Look for lt, gt characters literally
( ) Create a "capturing group" - this lets you reference this first value later on. Useful, because with it, we can match a closing tag to an opening tag.
Detail Match the word "Detail" literally
[\s\S] Match any character, OR any whitespace
*? Match as FEW of these as possible, so that you grab the first available closing tag. Without the question mark, it will grab as many characters as it can, meaning it'll grab the LAST closing tag instead.
\1 Reference to the first capturing group. If you change "Detail" to something else inside the parenthesis, it'll change this automatically, too.
Not sure what flavour you want to use, but /<Detail>.*?<Detail>/s
would work fine in perl .
/s
modifier tells perl to treat the entire text as a single line. This has the effect that in the pattern the .
pattern matches newline, as well as any other character<Detail>
: The matcher finds the first <Detail>
.
with quantifier *?
: the quantifier is zero or more, minimal matching so for now the matcher tries "zero matches" SUCCEEDS <Detail>
: This attempted match FAILS<Detail>
FAILS We have this merry dance going on, with step 3 inching through the string until the next literal <Detail>
appears.
The perl looks like this:
'<Details>
<Detail>
<Name>Donald</Name>
<Age>10</Age>
</Detail>
<Detail>
<Name>Donald2</Name>
<Age>102</Age>
</Detail>
</Details>
<Detail>' =~ /<Detail>.*?<Detail>/s and print "[$&]\n"
giving this output:
[<Detail>
<Name>Donald</Name>
<Age>10</Age>
</Detail>
<Detail>]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.