简体   繁体   中英

Regex to find the match which starts with a string and till first occurrence of another string including new line character

Is there a way to get the string starting from <Detail> and ending at the first occurrence of <Detail> string using regex?

Input

<Details>
<Detail>
<Name>Donald</Name>
<Age>10</Age>
</Detail>
<Detail>
<Name>Donald2</Name>
<Age>102</Age>
</Detail>
</Details>

Output

<Detail>
<Name>Donald</Name>
<Age>10</Age>
</Detail>
<Detail>

Assuming you're using Perl (or a compatible regex engine):

m{
    <Detail>   # match <Detail>
    .*?        # ... followed by 0 or more of any character, as few as possible
    <Detail>   # ... followed by another <Detail>
}xs

The s flag makes . match any character (including newline).

Here's a regex that might work, with a bit of added flexibility:

<(Detail)>[\\s\\S]*?<\\1>

How's it work?

<        >        <  >   Look for lt, gt characters literally
 (      )                Create a "capturing group" - this lets you reference this first value later on. Useful, because with it, we can match a closing tag to an opening tag.
  Detail                 Match the word "Detail" literally
          [\s\S]         Match any character, OR any whitespace
                *?       Match as FEW of these as possible, so that you grab the first available closing tag. Without the question mark, it will grab as many characters as it can, meaning it'll grab the LAST closing tag instead.
                   \1    Reference to the first capturing group. If you change "Detail" to something else inside the parenthesis, it'll change this automatically, too.

Try it here!

Not sure what flavour you want to use, but /<Detail>.*?<Detail>/s would work fine in perl .

  1. /s modifier tells perl to treat the entire text as a single line. This has the effect that in the pattern the . pattern matches newline, as well as any other character
  2. Literal text <Detail> : The matcher finds the first <Detail>
  3. Pattern . with quantifier *? : the quantifier is zero or more, minimal matching so for now the matcher tries "zero matches" SUCCEEDS
  4. Literal <Detail> : This attempted match FAILS
  5. The matcher steps back to step 3, but this time tries "one match." Indeed, it finds a single "any character" SUCCEEDS
  6. We find ourselves at step 4 again, looking for a literal <Detail> FAILS
  7. Back to step 3, but now we try "two matches"

We have this merry dance going on, with step 3 inching through the string until the next literal <Detail> appears.

The perl looks like this:

'<Details>
<Detail>
<Name>Donald</Name>
<Age>10</Age>
</Detail>
<Detail>
<Name>Donald2</Name>
<Age>102</Age>
</Detail>
</Details>
<Detail>' =~ /<Detail>.*?<Detail>/s and print "[$&]\n"

giving this output:

[<Detail>
<Name>Donald</Name>
<Age>10</Age>
</Detail>
<Detail>]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM