简体   繁体   中英

Can't undertand why my regex in Java doesn't work

I'm trying to find pieces of text on the webpage I fetch that lay between ' align="left">\\n " and ' </form>\\n</td> ' substrings.

I wrote a regex:

(align=\"left\">\\n)(?<part>.*?)(<\/form>\\n<\/td>)

and tested it at https://www.freeformatter.com/java-regex-tester.html where it works just as I need.

But in the Java code it can't find anything.

My test code that I'm trying make working:

String frontPage = "<html>\n<head>\n<title>Hello</title>\n</head>\n" + 
            "<body>\n<table>\n<tr align=\"left\">\n" + 
            "<td>Hello \n<form>\n<input type=\"submit\" value=\"ok\">\n" + 
            "</form>\n</td>\n" + 
            "<td>World \n<form>\n<input type=\"submit\" value=\"ok\">\n" + 
            "</form>\n</td>\n" + 
            "</tr>\n</table>\n</body>\n</html>";

java.util.regex.Pattern p =
                java.util.regex.Pattern.compile(
                        "(align=\"left\">\\n)(?<part>.*?)(<\\/form>\\n<\\/td>)");
java.util.regex.Matcher m = p.matcher(frontPage);

List<String> parts = new ArrayList<>();
while (m.find()) {
    parts.add(m.group("part")); 
}
if (parts.size() == 0)
    System.out.println("No page parts found");
else {
    System.out.println("Something matches at least");
}

It finds matches if only first two groups specified, but when I add at least simple (form) sequence to the last group, it stops matching anything, and I can't even guess why.

Add DOTALL to the compile . Like

java.util.regex.Pattern.compile(
        "(align=\"left\">\\n)(?<part>.*?)(<\\/form>\\n<\\/td>)",
        java.util.regex.Pattern.DOTALL
);

See it here at ideone .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM