memory leak parsing xml using xml.dom.minidom

Question

I'm using xml.dom.minidom to parse xml files, somewhat like this:

import xml.dom.minidom as dom

file= open('file.xml')
doc= dom.parse(file)
# SNIP
doc.unlink()

Even after unlinking the document, the memory usage is at about 120 MiB. When one is actually using the program, causing multiple xml files to be parsed, memory usage climbs to about 300 MiB, which is unacceptable.

I'm sure the memory leak isn't caused by my code, but by minidom, because even doing just

doc= dom.parse(file)
doc.unlink()

produces the same result.

Am I doing something wrong, or is this a bug in minidom?

PS: I'd prefer to stick to minidom, because there's a lot of xml parsing happening in my code, and I'd rather not completely rewrite all of it, but I will do it if there's no other choice.

Answer 1

I am also observing the same issues with minidom! And we are not alone. See for example here .

There it is suggested to use an other XML implementations with python binding like

xml.etree.ElementTree : alternative implementation in the Python standard library
libxml2 : XML C parser with python bindings
lxml : a more pythonic binding to libxml2

memory leak parsing xml using xml.dom.minidom

Question

1 answers

solution1
2 ACCPTED 2014-12-10 10:01:16

memory leak parsing xml using xml.dom.minidom

Question

1 answers

solution1 2 ACCPTED 2014-12-10 10:01:16

solution1
2 ACCPTED 2014-12-10 10:01:16