Java Html Parser and Closing Tags

Question

How do I handle closing tags (ex: </h1> ) with the Java HTML Parser Library?

For example, if I have the following:

public class MyFilter implements NodeFilter {

 public boolean accept(Node node) {
  if (node instanceof TagNode) {
   TagNode theNode = (TagNode) node;
   if (theNode.getRawTagName().equals("h1")) {
    return true;
   } else {
    return false;
   }
  }
  return false;
 }
}

public class MyParser {
 public final String parseString(String input) {
  Parser parser = new Parser();
  MyFilter theFilter = new MyFilter();
  parser.setInputHTML("<h1>Welcome, User</h1>");
  NodeList theList = parser.parse(theFilter);
  return theList.toHtml();
 }
}

When I run my parser, I get the following output back:

<h1>Welcome, User</h1>Welcome, User</h1>

The NodeList contains a list of size 3 with the following entities:

(tagNode) <h1>

(textNode) Welcome, User

(tagNode) </h1>

I would like the output to be " <h1>Welcome, User</h1> ". Does anyone see what is wrong in my sample parser?

Answer 1

HINT:

I think you must rely on isEndTag() API in that case.

Answer 2

Your filter is accepting too many nodes. For your sample input, you want to create a NodeList that has only a single node--for the <h1> tag. The other two nodes are children of that first node so should not be added to the NodeList .

If you add the following code, you may see better what the problem is.

for (Node node : theList.toNodeArray())
{
    System.out.println(node.toHtml());
}

It should print

<h1>Welcome, User</h1>
Welcome, User
</h1>

Java Html Parser and Closing Tags

Question

2 answers

solution1
0 ACCPTED 2010-04-26 19:22:39

solution2
0 2010-04-26 19:28:11

Java Html Parser and Closing Tags

Question

2 answers

solution1 0 ACCPTED 2010-04-26 19:22:39

solution2 0 2010-04-26 19:28:11

solution1
0 ACCPTED 2010-04-26 19:22:39

solution2
0 2010-04-26 19:28:11