简体   繁体   中英

memory leak parsing xml using xml.dom.minidom

I'm using xml.dom.minidom to parse xml files, somewhat like this:

import xml.dom.minidom as dom

file= open('file.xml')
doc= dom.parse(file)
# SNIP
doc.unlink()

Even after unlinking the document, the memory usage is at about 120 MiB. When one is actually using the program, causing multiple xml files to be parsed, memory usage climbs to about 300 MiB, which is unacceptable.

I'm sure the memory leak isn't caused by my code, but by minidom, because even doing just

doc= dom.parse(file)
doc.unlink()

produces the same result.

Am I doing something wrong, or is this a bug in minidom?

PS: I'd prefer to stick to minidom, because there's a lot of xml parsing happening in my code, and I'd rather not completely rewrite all of it, but I will do it if there's no other choice.

I am also observing the same issues with minidom! And we are not alone. See for example here .

There it is suggested to use an other XML implementations with python binding like

  • xml.etree.ElementTree : alternative implementation in the Python standard library
  • libxml2 : XML C parser with python bindings
  • lxml : a more pythonic binding to libxml2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM