[英]LXML issue parsing XML schema in Python 3
我正在尝试使用XRDTools
库将 Panalytical XRDML 文件转换为对数据库更友好的格式,例如 Pandas 数据帧。
XRDTools 库描述如下: https : //github.com/paruch-group/xrdtools 。 它将 XRDML 文件导入 Python 字典。 我对 LXML 完全陌生,所以如果这是一个简单的问题,我深表歉意。
我使用 Anaconda 创建了 Python 2.7 和 3.6 环境,专门用于处理 XRDTools 包。 我想在 Python 3.6 中运行它。
在 Python 2.7 中,这段代码运行流畅:
import xrdtools
xrd = xrdtools.read_xrdml('filename.xrdml')
输出是一个dict
:
{u'2Theta': array([63. , 63.00334225, 63.00668449, ..., 67.99331551,
67.99665775, 68. ]),
u'Lambda': 1.540598,
u'Omega': array([31. , 31.00200535, 31.0040107 , ..., 33.9959893 ,
33.99799465, 34. ]), ...
然后我可以像使用任何其他 Python 对象一样使用字典。
在 Python 3.6 中,相同的代码会生成以下错误消息:
Traceback (most recent call last):
File "...\AppData\Local\Continuum\Anaconda2\envs\py36xrd\lib\site-packages\IPython\core\interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-3-b6f5409b8bf9>", line 1, in <module>
xrd = xrdtools.read_xrdml('filename.xrdml')
File "...\XRDTools\xrdtools\xrdtools\io.py", line 297, in read_xrdml
valid = validate_xrdml_schema(filename)
File ...\XRDTools\xrdtools\xrdtools\io.py", line 43, in validate_xrdml_schema
xmlschema_doc = etree.parse(f)
File "src\lxml\etree.pyx", line 3444, in lxml.etree.parse (src\lxml\etree.c:83171)
File "src\lxml\parser.pxi", line 1855, in lxml.etree._parseDocument (src\lxml\etree.c:121011)
File "src\lxml\parser.pxi", line 1875, in lxml.etree._parseFilelikeDocument (src\lxml\etree.c:121294)
File "src\lxml\parser.pxi", line 1770, in lxml.etree._parseDocFromFilelike (src\lxml\etree.c:120078)
File "src\lxml\parser.pxi", line 1185, in lxml.etree._BaseParser._parseDocFromFilelike (src\lxml\etree.c:114806)
File "src\lxml\parser.pxi", line 598, in lxml.etree._ParserContext._handleParseResultDoc (src\lxml\etree.c:107724)
File "src\lxml\parser.pxi", line 709, in lxml.etree._handleParseResult (src\lxml\etree.c:109433)
File "src\lxml\parser.pxi", line 638, in lxml.etree._raiseParseError (src\lxml\etree.c:108287)
File "...\XRDTools\xrdtools\xrdtools\data\schemas\XRDMeasurement15.xsd", line 1
<?xml version="1.0" encoding="UTF-8"?>
^
XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1
深入研究io.py
,有这个功能:
def validate_xrdml_schema(filename):
"""Validate the xml schema of a given file.
Parameters
----------
filename : str
The Filename of the `.xrdml` file to test.
Returns
-------
float or None
Returns the version number as float or None if
the file was not matching any provided xml schema.
"""
schemas = [(1.5, 'data/schemas/XRDMeasurement15.xsd'),
(1.4, 'data/schemas/XRDMeasurement14.xsd'),
(1.3, 'data/schemas/XRDMeasurement13.xsd'),
(1.2, 'data/schemas/XRDMeasurement12.xsd'),
(1.1, 'data/schemas/XRDMeasurement11.xsd'),
(1.0, 'data/schemas/XRDMeasurement10.xsd'),
]
schemas = [(v, os.path.join(package_path, schema)) for v, schema in schemas]
with open(filename, 'r') as f:
data_xml = etree.parse(f)
for version, schema in schemas:
with open(schema, 'r') as f:
xmlschema_doc = etree.parse(f)
xmlschema = etree.XMLSchema(xmlschema_doc)
valid = xmlschema.validate(data_xml)
if valid:
return version
return None
从我读过的内容xmlschema_doc = etree.parse(f)
, xmlschema_doc = etree.parse(f)
导致了这些问题。 如果我将该行更改为etree.parse(filename)
,它将运行而不会出错,但我不确定这是否重要。 除了 Jupyter 笔记本中的一个小型独立单元之外,我也无法将该修复程序应用于任何其他内容。
导致错误的原因是什么? 有没有办法为 Python 3 修复它? 实施该修复程序的最佳方法是什么?
很想解决这个问题。 蒂亚!
我能找到的最相关的问题: Python 3.4 lxml.etree: Start tag expected, '<' not found, line 1, column 1
尝试:
with io.open(filename, 'r', encoding='utf8') as f:
data_xml = etree.parse(f)
( io.open
因为它对 Python 2 和 Python 3 的调用相同)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.