简体   繁体   中英

Match new line using regular expressions java?

I like to get html content using regular expressions. I have problems when the content is written in multiple lines. No matches are found. Here is the regular expression that I use:

String regExpContent = "<div class=\"views-field views-field-body\">(\\s+)<span class=\"field-content\">([\\:\\,\\w\\s\\.\\„\\”\\-\\(\\)0123456789(&nbsp;)(\r?\n)]+)</span>(\\s+)</div>";
Pattern regExpMatcherContent = Pattern.compile(regExpContent,
            Pattern.DOTALL | Pattern.UNICODE_CHARACTER_CLASS);

I use (\\r?\\n) to match new line. Can anybody help me?

Please use an HTML parser .

String html = "<div class=\"views-field views-field-body\">...</div>";
Document doc = Jsoup.parseBodyFragment(html);
Element body = doc.body();

Elements fieldContent = body.select("div.views-field-body span.field-content");

The use of regex for parsing HMTL has been discouraged so often that I won't repeat any of the arguments here. Suffice it to say that you really should not do it.

The problem is that you are using regex to parse html.You should use an html parser .


To answer your question

Your Pattern.DOTALL is redundant because you are not using . anywhere in your regex

\\s in your regex would match newlines because it is similar to [\\r\\n\\t ]

The problem is with your [\\\\:\\\\,\\\\w\\\\s\\\\.\\\\„\\\\”\\\\-\\\\(\\\\)0123456789(&nbsp;)(\\r?\\n)]+ ..It should ([:,\\\\w\\\\s.„”()-]|&nbsp;)+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM