Im using Liza Daly's fast_iter which has the structure of:
def fast_iter(context, args=[], kwargs={}):
"""
Deletes elements as the tree is travsersed to prevent the full tree from building and save memory
Author: Liza Daly, IBM
"""
for event, elem in context:
if elem.tag == 'target':
func(elem, *args, **kwargs)
elem.clear()
while elem.getprevious() is not None:
del elem.getparent()[0]
del context
return save
However, Ive noticed when i create my context as
context = etree.iterparse(path, events=('end',))
The data within the elem gets deleted before my function can even process it. For clarity, I am using fully synchronous code.
If I set my context as
context = etree.iterparse(path, events=('end',), tag='target')
It works correctly, however I know its not doing the full memory conservation that fast_iter is intended to provide.
Is there any reason to even use this when compared to xml.dom.pulldom
, a SAX parser which creates no tree? It seems like fast_iter
attempts to replicate this staying within lxml
.
Does anyone have ideas on what im doing wrong? TIA
I think I can see where your approach might delete data you want to access before the code to access it is called, let's assume you have eg
<target>
<foo>test</foo>
<bar>test</bar>
</target>
elements in your XML, then each time an end element tag is found your code
for event, elem in context:
if elem.tag == 'target':
func(elem, *args, **kwargs)
elem.clear()
while elem.getprevious() is not None:
del elem.getparent()[0]
is run, meaning it encounters the foo
end element tag, then the bar
end element tag where the while loop deletes the foo
sibling element and then the target
end element tag is encountered and I assume your function looks for both the foo
and the bar
element data but the foo
element has been deleted.
So somehow your code has to take the structure (you probably know) into account and don't do that while loop for children/descendants of your target
element.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.