繁体   English   中英

XML Python 解析器抛出错误

[英]XML Python Parser thhrowing errors

这是 XML DTD(至少我认为它是 DTD,我不太精通 XML,所以如果我错了,请纠正我):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE PATDOC SYSTEM "-US-Grant-025xml.dtdST32" [
<!ENTITY USD0484671-20040106-D00000.TIF SYSTEM "USD0484671-20040106-D00000.TIF" NDATA TIF>
<!ENTITY USD0484671-20040106-D00001.TIF SYSTEM "USD0484671-20040106-D00001.TIF" NDATA TIF>
<!ENTITY USD0484671-20040106-D00002.TIF SYSTEM "USD0484671-20040106-D00002.TIF" NDATA TIF>
<!ENTITY USD0484671-20040106-D00003.TIF SYSTEM "USD0484671-20040106-D00003.TIF" NDATA TIF>
<!ENTITY USD0484671-20040106-D00004.TIF SYSTEM "USD0484671-20040106-D00004.TIF" NDATA TIF>
<!ENTITY USD0484671-20040106-D00005.TIF SYSTEM "USD0484671-20040106-D00005.TIF" NDATA TIF>
<!ENTITY USD0484671-20040106-D00006.TIF SYSTEM "USD0484671-20040106-D00006.TIF" NDATA TIF>
<!ENTITY USD0484671-20040106-D00007.TIF SYSTEM "USD0484671-20040106-D00007.TIF" NDATA TIF>
<!ENTITY USD0484671-20040106-D00008.TIF SYSTEM "USD0484671-20040106-D00008.TIF" NDATA TIF>
<!ENTITY USD0484671-20040106-D00009.TIF SYSTEM "USD0484671-20040106-D00009.TIF" NDATA TIF>
<!ENTITY USD0484671-20040106-D00010.TIF SYSTEM "USD0484671-20040106-D00010.TIF" NDATA TIF>
<!ENTITY USD0484671-20040106-D00011.TIF SYSTEM "USD0484671-20040106-D00011.TIF" NDATA TIF>
<!ENTITY USD0484671-20040106-D00012.TIF SYSTEM "USD0484671-20040106-D00012.TIF" NDATA TIF>
]>
<PATDOC DTD="2.5" STATUS="Build 20030724">

当我尝试运行我的 python 解析器时出现以下错误

Traceback (most recent call last):
  File "C:\Users\John\Desktop\FINAL BART ALL INFO-Magic Bullet.py", line 75, in <module>
    doc = etree.XML(item)
  File "lxml.etree.pyx", line 2723, in lxml.etree.XML (src/lxml/lxml.etree.c:52448)
  File "parser.pxi", line 1573, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:79932)
  File "parser.pxi", line 1452, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:78774)
  File "parser.pxi", line 960, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:75389)
  File "parser.pxi", line 564, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:71739)
  File "parser.pxi", line 645, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:72614)
  File "parser.pxi", line 585, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:71955)
XMLSyntaxError: Entity 'num' not defined, line 166, column 84

这需要专利 XML 数据并将其解析为一个带分隔符的文件。 另外,我使用了“从 lxml 导入 etree 导入 urllib2、os、zipfile”

&num; 是“#”的实体,但 lxml 认为它不是格式良好的 XML。

检查文件的 DTD 以查看它是否允许实体 - 如果没有 DTD 这就是问题的一部分。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM