简体   繁体   中英

How to extract text between two different xml tags multiline

For example we have some xml like this

<parent>
    <child>SomeText</child>sometext<otherChild>sometext</otherChild>
    <child>SomeText2</child>somtext2<otherChild>sometext2</otherChild>
</parent>

Which regex could be applied in order to extract content after </child> and before next <child> This string should be extracted sometext<otherChild>sometext</otherChild> in group 1, group 2 should include somtext2<otherChild>sometext2</otherChild> .

Already tried to apply regex like this but it works only for the first match

String textToParse = ...;
Pattern pattern = Pattern.compile("(?<=</child>)(.*?)(?=<child>)", Pattern.DOTALL);

        final Matcher matcher = pattern.matcher(textToParse);
        if (matcher.find()) {
            LOGGER.info(matcher.group());
        }

This should work:

Pattern pattern = Pattern.compile("(?<=</child>)(.*?)(?=<child>|</parent>)", Pattern.DOTALL);

Add the |</parent> because in the last match there is no next <child> tag.

Also you should do matcher.find() and matcher.group() again to get to the next match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM