简体   繁体   中英

How to resolve external entities with xml.etree like lxml.etree

I have a script that parses XML using lxml.etree :

from lxml import etree

parser = etree.XMLParser(load_dtd=True, resolve_entities=True)
tree = etree.parse('main.xml', parser=parser)

I need load_dtd=True and resolve_entities=True be have &emptyEntry; from globals.xml resolved:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE map SYSTEM "globals.xml" [
    <!ENTITY dirData "${DATADIR}"> 
]>
<map 
    xmlns:map="http://my.dummy.org/map"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsschemaLocation="http://my.dummy.org/map main.xsd"
>

  &emptyEntry; <!-- from globals.xml -->

  <entry><key>KEY</key><value>VALUE</value></entry>
  <entry><key>KEY</key><value>VALUE</value></entry>
</map>

with globals.xml

<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY emptyEntry "<entry></entry>">

Now I would like to move from non-standard lxml to standard xml.etree . But this fails with my file because the load_dtd=True and resolve_entities=True is not supported by xml.etree .

Is there an xml.etree -way to have these entities resolved?

My trick is to use the external program xmllint

proc = subprocess.Popen(['xmllint','--noent',fname],stdout=subprocess.PIPE)
output = proc.communicate()[0]
tree = ElementTree.parse(StringIO.StringIO(output))

lxml is a right tool for the job.

But, if you want to use stdlib, then be prepared for difficulties and take a look at XMLParser's UseForeignDTD method. Here's a good (but hacky) example: Python ElementTree support for parsing unknown XML entities?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM