[英]Getting internal DTD with lxml
I wanted to try out lxml to get the elements of an internal DTD but fail to do this. 我想尝试使用lxml来获取内部DTD的元素,但是无法做到这一点。 First here is my xml file ( http://validator.w3.org asserts it as valid):
首先是我的xml文件( http://validator.w3.org断言它是有效的):
<?xml
version='1.1'
encoding='utf-8'
?>
<!DOCTYPE root [
<!ATTLIST test
attr (A | B | C) 'B'
>
<!ELEMENT test (#PCDATA)>
<!ELEMENT root (test)*>
]>
<root></root>
But using lxml.etree.DTD(file = 'test.xml') throws an exception: 但是使用lxml.etree.DTD(file ='test.xml')会引发异常:
Traceback (most recent call last):
File "./test.py", line 6, in <module>
lxml.etree.DTD(file = 'test.xml')
File "dtd.pxi", line 285, in lxml.etree.DTD.__init__ (src/lxml/lxml.etree.c:152121)
lxml.etree.DTDParseError: Content error in the external subset, line 5, column 1
Maybe lxml.etree.DTD doesn't support internal DTD's or I'm making something wrong. 也许lxml.etree.DTD不支持内部DTD,或者我做错了。 I also wanted to try lxml.etree.parse() but I can't figure out the methods of this class (I have looked into the reference for parse() but it is not linking to the methods).
我也想尝试lxml.etree.parse(),但是我无法弄清楚此类的方法(我已经参考了parse()的引用,但未链接到这些方法)。 The task is in theory simple but I can't find the needed informations.
理论上,该任务很简单,但是我找不到所需的信息。
I'm not sure what you are looking for, but you may be able to find it using an interactive Python interpreter with tab-completion, such as IPython . 我不确定您在寻找什么,但是您可以使用带有Tab 补全功能的交互式Python解释器(例如IPython)找到它。 That's how I found this:
这就是我发现的方式:
import lxml.etree as ET
import io
content = '''<?xml
version='1.1'
encoding='utf-8'
?>
<!DOCTYPE root [
<!ATTLIST test
attr (A | B | C) 'B'
>
<!ELEMENT test (#PCDATA)>
<!ELEMENT root (test)*>
]>
<root></root>'''
tree = ET.parse(io.BytesIO(content))
info = tree.docinfo
dtd = info.internalDTD
for elt in dtd.elements():
print(elt)
print(elt.content)
print
# <lxml.etree._DTDElementDecl object name='test' prefix=None type='mixed' at 0xb73e044c>
# <lxml.etree._DTDElementContentDecl object name=None type='pcdata' occur='once' at 0xb73e04ac>
# <lxml.etree._DTDElementDecl object name='root' prefix=None type='element' at 0xb73e046c>
# <lxml.etree._DTDElementContentDecl object name='test' type='element' occur='mult' at 0xb73e04ac>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.