简体   繁体   English

使用lxml获取内部DTD

[英]Getting internal DTD with lxml

I wanted to try out lxml to get the elements of an internal DTD but fail to do this. 我想尝试使用lxml来获取内部DTD的元素,但是无法做到这一点。 First here is my xml file ( http://validator.w3.org asserts it as valid): 首先是我的xml文件( http://validator.w3.org断言它是有效的):

<?xml
    version='1.1'
    encoding='utf-8'
?>
<!DOCTYPE root [
    <!ATTLIST test
        attr (A | B | C) 'B'
    >
    <!ELEMENT test (#PCDATA)>
    <!ELEMENT root (test)*>
]>
<root></root>

But using lxml.etree.DTD(file = 'test.xml') throws an exception: 但是使用lxml.etree.DTD(file ='test.xml')会引发异常:

Traceback (most recent call last):
  File "./test.py", line 6, in <module>
    lxml.etree.DTD(file = 'test.xml')
  File "dtd.pxi", line 285, in lxml.etree.DTD.__init__ (src/lxml/lxml.etree.c:152121)
lxml.etree.DTDParseError: Content error in the external subset, line 5, column 1

Maybe lxml.etree.DTD doesn't support internal DTD's or I'm making something wrong. 也许lxml.etree.DTD不支持内部DTD,或者我做错了。 I also wanted to try lxml.etree.parse() but I can't figure out the methods of this class (I have looked into the reference for parse() but it is not linking to the methods). 我也想尝试lxml.etree.parse(),但是我无法弄清楚此类的方法(我已经参考了parse()的引用,但未链接到这些方法)。 The task is in theory simple but I can't find the needed informations. 理论上,该任务很简单,但是我找不到所需的信息。

I'm not sure what you are looking for, but you may be able to find it using an interactive Python interpreter with tab-completion, such as IPython . 我不确定您在寻找什么,但是您可以使用带有Tab 补全功能的交互式Python解释器(例如IPython)找到它。 That's how I found this: 这就是我发现的方式:

import lxml.etree as ET
import io

content = '''<?xml
    version='1.1'
    encoding='utf-8'
?>
<!DOCTYPE root [
    <!ATTLIST test
        attr (A | B | C) 'B'
    >
    <!ELEMENT test (#PCDATA)>
    <!ELEMENT root (test)*>
]>
<root></root>'''

tree = ET.parse(io.BytesIO(content))
info = tree.docinfo
dtd = info.internalDTD

for elt in dtd.elements():
    print(elt)
    print(elt.content)
    print

# <lxml.etree._DTDElementDecl object name='test' prefix=None type='mixed' at 0xb73e044c>
# <lxml.etree._DTDElementContentDecl object name=None type='pcdata' occur='once' at 0xb73e04ac>

# <lxml.etree._DTDElementDecl object name='root' prefix=None type='element' at 0xb73e046c>
# <lxml.etree._DTDElementContentDecl object name='test' type='element' occur='mult' at 0xb73e04ac>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM