Jsoup remove node and children while traversing

Question

I am trying to traverse the nodes of a DOM with Jsoup, and remove some nodes and its children if a condition is met. However, I'm getting a java.lang.NullPointerException exception in doing so. I have something like:

File input = new File(inputPath);
Document doc = Jsoup.parse(input, "UTF-8");

doc.traverse(new NodeVisitor() {

    @Override
    public void head(Node node, int depth) {

      switch (node.getClass().getName()){

        case "org.jsoup.nodes.Element":

            Element elem = (Element) node;
            Map<String, String> dataset = elem.dataset();
            for (String key : dataset.keySet()) {

                .....

                // Here is the problem
                if (someCondition) node.remove()
            }
            break;

       case "org.jsoup.nodes.TextNode":

           ....
           break;
       }
    }

    @Override
    public void tail(Node node, int depth) {

    }
});

Somehow it makes sense that it won't let me remove nodes while iterating on them, but what would be the way to achieve this then? Remove a node and its children while traversing the DOM?

Answer 1

Removing nodes in head or tail will not work reliably (actually it seems to depend on which nodes you remove). Instead of removing while traversing, you can simply store references to the nodes you want to remove and process them afterwards.

List<Node> toRemove = new LinkedList<>();
doc.traverse(new NodeVisitor() {
    @Override
    public void head(Node node, int depth) {
        // ...
        if(condition)
            toRemove.add(node);
    }
    // ...
});

for (Node node : toRemove)
    node.remove();

The sample above should work, even if you remove all non-root nodes.

Answer 2

just guessing: try to remove the node at the end of traverse method. Or, restart the traverse each time you remove it.

Jsoup remove node and children while traversing

Question

2 answers

solution1
2 ACCPTED 2016-04-14 20:43:38

solution2
0 2016-04-14 14:54:56

Jsoup remove node and children while traversing

Question

2 answers

solution1 2 ACCPTED 2016-04-14 20:43:38

solution2 0 2016-04-14 14:54:56

solution1
2 ACCPTED 2016-04-14 20:43:38

solution2
0 2016-04-14 14:54:56