LXML杀死我的CDATA部分

Question

I'm batch-converting a lot of XML files, changing their character encodings to UTF-8: 我批量转换大量XML文件，将其字符编码更改为UTF-8：

with open(source_filename, "rb") as source:
    tree = etree.parse(source)

    with open(destination_filename, "wb") as destination:
        tree.write(destination, encoding="UTF-8", xml_declaration=True)

Unfortunately, it is destroying my CDATA sections and just escaping them instead. 不幸的是，它正在摧毁我的CDATA部分而只是逃避它们。

Source : 来源：

<d><![CDATA[áÌÀøÅàùÑÄéú ëÌÄé áÈàÅùÑ éäå''ä ðÄùÑÀôÌÈè <small><small>(ùí ëå èæ)</small></small>

Destination : 目的地 ：

<d>בְּרֵאשִׁית כִּי בָאֵשׁ יהו''ה נִשְׁפָּט &lt;small&gt;&lt;small&gt;(שם כו טז)&lt;/small&gt;&lt;/small&gt;

Is there a setting which I can set which will tell it to leave my CDATA sections alone? 有没有我可以设置的设置会告诉它单独留下我的CDATA部分？ I'm mainly using LXML to change the character encoding and to write the XML header properly. 我主要使用LXML来更改字符编码并正确编写XML头。

Answer 1

Use the strip_cdata=False option : 使用strip_cdata=False选项：

import lxml.etree as etree
parser = etree.XMLParser(strip_cdata=False)
with open(source_filename, "rb") as source:
    tree = etree.parse(source, parser=parser)

LXML杀死我的CDATA部分

问题描述

1 个解决方案

解决方案1
14 已采纳 2014-09-12 17:41:04

LXML杀死我的CDATA部分

问题描述

1 个解决方案

解决方案1 14 已采纳 2014-09-12 17:41:04

解决方案1
14 已采纳 2014-09-12 17:41:04