简体   繁体   English

为什么在使用 lxml 编写 XML 文件后,属性值中的非 ASCII 字符会转义?

[英]Why are non-ASCII characters escaped in attribute-values after writing an XML-file with lxml?

I'm trying to continually build an xml-file with python and with etree.xmlfile from lxml .我正在尝试使用etree.xmlfile和来自lxml的 etree.xmlfile 不断构建一个 xml 文件。

My input is an XML-file, where there are umlauts in attribute values.我的输入是一个 XML 文件,其中属性值中有变音符号。 I read this in with lxml , make some changes to the names of the attributes, and then write it to a new file.我用lxml读了这个,对属性的名称进行了一些更改,然后将其写入一个新文件。

This is my code, broken down:这是我的代码,分解:

with etree.xmlfile(path_to_new_file, encoding="utf8") as xf:                                             
    with xf.element("corpus"):                                                                      
        for _, element in etree.iterparse(path_to_original_file, tag="comment"):                                               
            new_element = transform_element(element)                                                                                                   
            xf.write(new_element)
            del element
            del new_element

In the original file, I might have an element like this:在原始文件中,我可能有这样的元素:

<comment title="Kübel">Some text with umlauts like this üä</comment>

But after processing, the same comment in the new file looks like this:但是经过处理后,新文件中的相同注释看起来像这样:

<comment title="K&#xFC;bel">Some text with umlauts like this üä</comment>

Do you have any idea what might cause this?你知道是什么原因造成的吗?

ü does not have to be escaped in an XML attribute value (or in a text node child of an element). ü不必在 XML 属性值(或元素的文本节点子节点)中转义。

Probably the developer of the library was being overly cautious and called an generic escape string function, possibly to leverage its escaping of < , which always has to be escaped, and ' or " which have to be escaped when matching the delimiting quotation mark for the attribute value.可能该库的开发人员过于谨慎,并称其为通用转义字符串 function,可能是利用其 escaping 的< ,它总是必须被转义,而'"在匹配分隔引号时必须被转义属性值。

For precise escaping requirements concisely presented, see Simplified XML Escaping .有关简明扼要的 escaping 要求,请参阅简化版 XML Escaping

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM