简体   繁体   中英

Java find html tag

Hi I am trying to delete an HTML tag from a string. The tag I am trying to delete is

<td class="gutter"> text text </td>

I tried the following but nothing worked:

String regex = "<td class=\"gutter\">([^<]*)</td>";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(htmlstring);
m.find() / m.matches()

But cant seem to find it at all... What am I doing wrong?

You can't use regular expressions to work with HTML (or XML). It is impossible to do it right (not "hard", but technically impossible). Use a HTML parser like Jsoup . Then it is easy, just follow the docs.

If you want to strip tags from HTML, use a library that does that. Don't roll your own HTML parser.

<plug shameless="true">

http://code.google.com/p/owasp-java-html-sanitizer/

A fast and easy to configure HTML Sanitizer written in Java which lets you include HTML authored by third-parties in your web application while protecting against XSS.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM