I am trying to traverse the nodes of a DOM with Jsoup, and remove some nodes and its children if a condition is met. However, I'm getting a java.lang.NullPointerException
exception in doing so. I have something like:
File input = new File(inputPath);
Document doc = Jsoup.parse(input, "UTF-8");
doc.traverse(new NodeVisitor() {
@Override
public void head(Node node, int depth) {
switch (node.getClass().getName()){
case "org.jsoup.nodes.Element":
Element elem = (Element) node;
Map<String, String> dataset = elem.dataset();
for (String key : dataset.keySet()) {
.....
// Here is the problem
if (someCondition) node.remove()
}
break;
case "org.jsoup.nodes.TextNode":
....
break;
}
}
@Override
public void tail(Node node, int depth) {
}
});
Somehow it makes sense that it won't let me remove nodes while iterating on them, but what would be the way to achieve this then? Remove a node and its children while traversing the DOM?
Removing nodes in head
or tail
will not work reliably (actually it seems to depend on which nodes you remove). Instead of removing while traversing, you can simply store references to the nodes you want to remove and process them afterwards.
List<Node> toRemove = new LinkedList<>();
doc.traverse(new NodeVisitor() {
@Override
public void head(Node node, int depth) {
// ...
if(condition)
toRemove.add(node);
}
// ...
});
for (Node node : toRemove)
node.remove();
The sample above should work, even if you remove all non-root nodes.
just guessing: try to remove the node at the end of traverse method. Or, restart the traverse each time you remove it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.