Hi I am trying to delete an HTML tag from a string. The tag I am trying to delete is
<td class="gutter"> text text </td>
I tried the following but nothing worked:
String regex = "<td class=\"gutter\">([^<]*)</td>";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(htmlstring);
m.find() / m.matches()
But cant seem to find it at all... What am I doing wrong?
You can't use regular expressions to work with HTML (or XML). It is impossible to do it right (not "hard", but technically impossible). Use a HTML parser like Jsoup . Then it is easy, just follow the docs.
If you want to strip tags from HTML, use a library that does that. Don't roll your own HTML parser.
<plug shameless="true">
http://code.google.com/p/owasp-java-html-sanitizer/
A fast and easy to configure HTML Sanitizer written in Java which lets you include HTML authored by third-parties in your web application while protecting against XSS.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.