简体   繁体   中英

Removing DOM nodes when traversing a NodeList

I'm about to delete certain elements in an XML document, using code like the following:

NodeList nodes = ...;
for (int i = 0; i < nodes.getLength(); i++) {
  Element e = (Element)nodes.item(i);
  if (certain criteria involving Element e) {
    e.getParentNode().removeChild(e);
  }
}

Will this interfere with proper traversal of the NodeList? Any other caveats with this approach? If this is totally wrong, what's the proper way to do it?

So, given that removing nodes while traversing the NodeList will cause the NodeList to be updated to reflect the new reality, I assume that my indices will become invalid and this will not work.

So, it seems the solution is to keep track of the elements to delete during the traversal, and delete them all afterward, once the NodeList is no longer used.

NodeList nodes = ...;
Set<Element> targetElements = new HashSet<Element>();
for (int i = 0; i < nodes.getLength(); i++) {
  Element e = (Element)nodes.item(i);
  if (certain criteria involving Element e) {
    targetElements.add(e);
  }
}
for (Element e: targetElements) {
  e.getParentNode().removeChild(e);
}

Removing nodes while looping will cause undesirable results, eg either missed or duplicated results. This isn't even an issue with synchronization and thread safety, but if the nodes are modified by the loop itself. Most of Java's Iterator's will throw a ConcurrentModificationException in such a case, something that NodeList does not account for.

It can be fixed by decrementing NodeList size and by decrementing iteraror pointer at the same time. This solution can be used only if we proceed one remove action for each loop iteration.

NodeList nodes = ...;
for (int i = nodes.getLength() - 1; i >= 0; i--) {
  Element e = (Element)nodes.item(i);
   if (certain criteria involving Element e) {
    e.getParentNode().removeChild(e);
  }
}

According to the DOM specificaion, the result of a call to node.getElementsByTagName("...") is supposed to be "live", that is, any modification made to the DOM tree will be reflected in the NodeList object. Well, for conforming implementations, that is...

NodeList and NamedNodeMap objects in the DOM are live; that is, changes to the underlying document structure are reflected in all relevant NodeList and NamedNodeMap objects.

( DOM Specification )

So, when you modify the tree structure, a conforming implementation will change the NodeList to reflect these changes.

The Practical XML library now contains NodeListIterator , which wraps a NodeList and provides full Iterator support (this seemed like a better choice than posting the code that we discussed in the comments). If you don't want to use the full library, feel free to copy that one class: http://practicalxml.svn.sourceforge.net/viewvc/practicalxml/trunk/src/main/java/net/sf/practicalxml/util/NodeListIterator.java?revision=125&view=markup

According to the DOM Level 3 Core specification,

the result of a call to method node.getElementsByTagName("...") will be a reference to a " live " NodeList type.

NodeList and NamedNodeMap objects in the DOM are live; that is, changes to the underlying document structure are reflected in all relevant NodeList and NamedNodeMap objects. ... changes are automatically reflected in the NodeList, without further action on the user's part.

1.1.1 The DOM Structure Model, para. 2

JavaSE 7 conforms to the DOM Level 3 specification: it implements the live NodeList interface and defines it as a type; it defines and exposes getElementsByTagName method on Interface Element , which returns the live NodeList type.


References

W3C - Document Object Model (DOM) Level 3 Core Specification - getElementsByTagName

JavaSE 7 - Interface Element

JavaSE 7 - NodeList Type

Old post, but nothing marked as answer. My approach is to iterate from the end, ie

for (int i = nodes.getLength() - 1; i >= 0; i--) {
    // do processing, and then
    e.getParentNode().removeChild(e);
}

With this, you needn't worry about the NodeList getting shorter while you delete.

As already mentioned, removing an element reduces the size of the list but the counter is still increasing (i++):

[element 1] <- Delete 
[element 2]
[element 3]
[element 4]
[element 5]

[element 2]  
[element 3] <- Delete
[element 4]
[element 5]
--

[element 2]  
[element 4] 
[element 5] <- Delete
--
--

[element 2]  
[element 4] 
--
--
--

The simplest solution, in my opinion, would be to remove i++ section in the loop and do it as needed when the iterated element was not deleted.

NodeList nodes = ...;
for (int i = 0; i < nodes.getLength();) {
  Element e = (Element)nodes.item(i);
  if (certain criteria involving Element e) {
    e.getParentNode().removeChild(e);        
  } else {
    i++;
  }
}

Pointer stays on the same place when the iterated element was deleted. The list shifts by itself.

[element 1] <- Delete 
[element 2]
[element 3]
[element 4]
[element 5]

[element 2] <- Leave
[element 3]
[element 4]
[element 5]
--

[element 2] 
[element 3] <- Leave
[element 4]
[element 5]
--

[element 2] 
[element 3] 
[element 4] <- Delete
[element 5]
--

[element 2] 
[element 3] 
[element 5] <- Delete
--
--

[element 2] 
[element 3] 
--
--
--

At the end you must update the XML file within the path of your project.

TransformerFactory transFactory = TransformerFactory.newInstance();
                        Transformer transformer = transFactory.newTransformer();
                        DOMSource source = new DOMSource(documentoXml);
                        StreamResult result = new StreamResult(new File(path + "\\resources\\xml\\UsuariosFile.xml"));
                        transformer.transform(source, result);

if you do not put these lines, your file will not be updated

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM