简体   繁体   English

Python 3:无法使用xmltodict将XML转换为dict

[英]Python 3: Unable to convert XML to dict using xmltodict

I am trying to convert data from an XML file to python dict, but am unable to do so. 我正在尝试将数据从XML文件转换为python dict,但无法这样做。 Following is the code I'm writing. 以下是我正在编写的代码。

import xmltodict
input_xml  = 'data.xml'  # This is the source file

with open(input_xml, encoding='utf-8', errors='ignore') as _file:
    data = _file.read()
    data = xmltodict.parse(data,'ASCII')
    print(data)
    exit()

On executing this code, following is the error I'm getting: 在执行此代码时,以下是我得到的错误:
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 239, column 40.
After multiple hits and trials, I realized that my xml has some characters in Hindi language, inside a particular tag, as shown below 经过多次敲击和试验,我意识到我的xml在特定标记内有一些印地语字符,如下所示

<DECL>!! आप की सेवा में पुनः पधारे !!</DECL>

How I can ignore these unencoded characters before running xmltodict.parse ? 如何在运行xmltodict.parse之前忽略这些未编码的字符?

I would guess the issue is related to the encoding of the file you are reading. 我想这个问题与您正在读取的文件的编码有关。 Why are you trying to parse it with 'ASCII'?? 为什么要尝试使用“ ASCII”来解析它?

If you attempt to read that same XML from a python string without the ASCII it should work just fine: 如果您尝试从没有ASCII的python字符串中读取相同的XML,则应该可以正常工作:

import xmltodict
xml = """<DECL>!! आप की सेवा में पुनः पधारे !!</DECL>"""
xmltodict.parse(xml, process_namespaces=True)

Results in: 结果是:

OrderedDict([('DECL', '!! आप की सेवा में पुनः पधारे !!')]) 

Using a file with that single input line I am able to parse it properly with: 使用具有该单个输入行的文件,我可以使用以下命令正确解析该文件:

import xmltodict
input_xml  = 'tmp.txt'  # This is the source file

with open(input_xml, encoding='utf-8', mode='r') as _file:
    data = _file.read()
    data = xmltodict.parse(data)
    print(data)

The issue is most probably that you are trying to parse it as "ASCII". 问题很可能是您试图将其解析为“ ASCII”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM