How do I handle closing tags (ex: </h1>
) with the Java HTML Parser Library?
For example, if I have the following:
public class MyFilter implements NodeFilter {
public boolean accept(Node node) {
if (node instanceof TagNode) {
TagNode theNode = (TagNode) node;
if (theNode.getRawTagName().equals("h1")) {
return true;
} else {
return false;
}
}
return false;
}
}
public class MyParser {
public final String parseString(String input) {
Parser parser = new Parser();
MyFilter theFilter = new MyFilter();
parser.setInputHTML("<h1>Welcome, User</h1>");
NodeList theList = parser.parse(theFilter);
return theList.toHtml();
}
}
When I run my parser, I get the following output back:
<h1>Welcome, User</h1>Welcome, User</h1>
The NodeList contains a list of size 3 with the following entities:
(tagNode) <h1>
(textNode) Welcome, User
(tagNode) </h1>
I would like the output to be " <h1>Welcome, User</h1>
". Does anyone see what is wrong in my sample parser?
HINT:
I think you must rely on isEndTag() API in that case.
Your filter is accepting too many nodes. For your sample input, you want to create a NodeList
that has only a single node--for the <h1>
tag. The other two nodes are children of that first node so should not be added to the NodeList
.
If you add the following code, you may see better what the problem is.
for (Node node : theList.toNodeArray())
{
System.out.println(node.toHtml());
}
It should print
<h1>Welcome, User</h1>
Welcome, User
</h1>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.