简体   繁体   English

LXML杀死我的CDATA部分

[英]LXML kills my CDATA sections

I'm batch-converting a lot of XML files, changing their character encodings to UTF-8: 我批量转换大量XML文件,将其字符编码更改为UTF-8:

with open(source_filename, "rb") as source:
    tree = etree.parse(source)

    with open(destination_filename, "wb") as destination:
        tree.write(destination, encoding="UTF-8", xml_declaration=True)

Unfortunately, it is destroying my CDATA sections and just escaping them instead. 不幸的是,它正在摧毁我的CDATA部分而只是逃避它们。

Source : 来源

<d><![CDATA[áÌÀøÅàùÑÄéú ëÌÄé áÈàÅùÑ éäå''ä ðÄùÑÀôÌÈè <small><small>(ùí ëå èæ)</small></small>

Destination : 目的地

<d>בְּרֵאשִׁית כִּי בָאֵשׁ יהו''ה נִשְׁפָּט &lt;small&gt;&lt;small&gt;(שם כו טז)&lt;/small&gt;&lt;/small&gt;

Is there a setting which I can set which will tell it to leave my CDATA sections alone? 有没有我可以设置的设置会告诉它单独留下我的CDATA部分? I'm mainly using LXML to change the character encoding and to write the XML header properly. 我主要使用LXML来更改字符编码并正确编写XML头。

Use the strip_cdata=False option : 使用strip_cdata=False选项

import lxml.etree as etree
parser = etree.XMLParser(strip_cdata=False)
with open(source_filename, "rb") as source:
    tree = etree.parse(source, parser=parser)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM