简体   繁体   English

使用Python读取XML DOCTYPE信息

[英]Reading XML DOCTYPE info with Python

I need to parse a version of an XML file as follows. 我需要解析XML文件的版本,如下所示。

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE twReport [ 
<!ELEMENT twReport (twHead?, (twWarn | twDebug | twInfo)*, twBody, twSum?, 
               twDebug*, twFoot?, twClientInfo?)> 
<!ATTLIST twReport version CDATA "10,4"> <----- VERSION INFO HERE

I use xml.dom.minidom for parsing XML file, and I need to parse the version of the XML file written in embedded DTD. 我使用xml.dom.minidom来解析XML文件,并且我需要解析以嵌入式DTD编写的XML文件的版本。

  • Can I use xml.dom.minidom for this purpose? 我可以为此目的使用xml.dom.minidom吗?
  • Is there any python XML parser for that purposes? 是否有用于此目的的python XML解析器?

How about xmlproc 's DTD api ? xmlprocDTD API如何?

Here's a random snippet of code I wrote years and years ago to do some work with DTDs from Python, which might give you an idea of what it's like to work with this library: 这是我几年前编写的随机代码段,用于处理Python的DTD,这可能使您了解使用此库的感觉:

from xml.parsers.xmlproc import dtdparser

attr_separator = '_'
child_separator = '_'

dtd = dtdparser.load_dtd('schedule.dtd')

for name, element in dtd.elems.items():
    for attr in element.attrlist:
        output = '%s%s%s = ' % (name, attr_separator, attr)
        print output
    for child in element.get_valid_elements(element.get_start_state()):
        output = '%s%s%s = ' % (name, child_separator, child)
        print output

(FYI, this was the first result when searching for "python dtd parser" ) (仅供参考,这是搜索“ python dtd解析器”时的第一个结果)

Because both of the the standard library XML libraries ( xml.dom.minidom and xml.etree ) use the same parser ( xml.parsers.expat ) you are limited in the "quality" of XML data you are able to successfully parse. 由于两个标准库XML库( xml.dom.minidomxml.etree )都使用相同的解析器( xml.parsers.expat ),因此您无法成功解析XML数据的“质量”受到限制。

You're better off using the tried-and-true 3rd party modules out there like lxml or BeautifulSoup that are not only more resilient to errors, but will also give you exactly what you are looking for with little trouble. 您最好使用经过实践lxml第三方模块,例如lxmlBeautifulSoup ,它们不仅可以更有效地解决错误,还可以为您提供真正想要的东西而没有什么麻烦。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM