简体   繁体   中英

Java string - get everything between (but not including) two regular expressions?

In Java, is there a simple way to extract a substring by specifying the regular expression delimiters on either side, without including the delimiters in the final substring?

For example, if I have a string like this:

<row><column>Header text</column></row>

what is the easiest way to extract the substring:

Header text

Please note that the substring may contain line breaks...

thanks!

Write a regex like this:

"(regex1)(.*)(regex2)"

... and pull out the middle group from the matcher (to handle newlines in your pattern you want to use Pattern.DOTALL ).

Using your example we can write a program like:

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Regex {

    public static void main(String[] args) {
        Pattern p = Pattern.compile(
                "<row><column>(.*)</column></row>",
                Pattern.DOTALL
            );

        Matcher matcher = p.matcher(
                "<row><column>Header\n\n\ntext</column></row>"
            );

        if(matcher.matches()){
            System.out.println(matcher.group(1));
        }
    }

}

Which when run prints out:

Header


text

You should not use regular expressions to decode XML - this will eventually break if the input is not strictly controlled.

The easiest thing is probably to parse the XML up in a DOM tree (Java 1.4 and newer contain a XML parser directly) and then navigate the tree to pick out what you need.

Perhaps you would like to tell what you want to accomplish with your program?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM