简体   繁体   English

Jsoup在遍历时删除节点和子节点

[英]Jsoup remove node and children while traversing

I am trying to traverse the nodes of a DOM with Jsoup, and remove some nodes and its children if a condition is met. 我试图用Jsoup遍历DOM的节点,并在满足条件时删除一些节点及其子节点。 However, I'm getting a java.lang.NullPointerException exception in doing so. 但是,我在这样做时遇到了java.lang.NullPointerException异常。 I have something like: 我有类似的东西:

File input = new File(inputPath);
Document doc = Jsoup.parse(input, "UTF-8");

doc.traverse(new NodeVisitor() {

    @Override
    public void head(Node node, int depth) {

      switch (node.getClass().getName()){

        case "org.jsoup.nodes.Element":

            Element elem = (Element) node;
            Map<String, String> dataset = elem.dataset();
            for (String key : dataset.keySet()) {

                .....

                // Here is the problem
                if (someCondition) node.remove()
            }
            break;

       case "org.jsoup.nodes.TextNode":

           ....
           break;
       }
    }

    @Override
    public void tail(Node node, int depth) {

    }
});

Somehow it makes sense that it won't let me remove nodes while iterating on them, but what would be the way to achieve this then? 不知何故,它有意义的是它不会让我在迭代它们时删除节点,但是实现这个目的的方法是什么呢? Remove a node and its children while traversing the DOM? 在遍历DOM时删除节点及其子节点?

Removing nodes in head or tail will not work reliably (actually it seems to depend on which nodes you remove). 删除headtail节点将无法可靠地工作(实际上它似乎取决于您删除的节点)。 Instead of removing while traversing, you can simply store references to the nodes you want to remove and process them afterwards. 您可以简单地存储对要删除的节点的引用,然后再处理它们,而不是在遍历时删除。

List<Node> toRemove = new LinkedList<>();
doc.traverse(new NodeVisitor() {
    @Override
    public void head(Node node, int depth) {
        // ...
        if(condition)
            toRemove.add(node);
    }
    // ...
});

for (Node node : toRemove)
    node.remove();

The sample above should work, even if you remove all non-root nodes. 即使您删除了所有非根节点 ,上面的示例也应该有效。

just guessing: try to remove the node at the end of traverse method. 只是猜测:尝试在遍历方法结束时删除节点。 Or, restart the traverse each time you remove it. 或者,每次移除它时重新启动遍历。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM