简体   繁体   中英

lxml XSLT removes CDATA while processing XML

Handling CDATA with lxml involves making parser with suitable declaration, but how about XSLT? For example:

from lxml import etree

parser = etree.XMLParser(strip_cdata=False)
tree = etree.parse('sample_with_cdata.xml', parser)
transform = etree.XSLT(etree.parse('dupe.xsl'))
xml_out = transform(tree)
xml_out.write('processed.xml')

If I process xml file with CDATA through lxml XSLT processor, all CDATA is stripped. How can I tell XSLT processor to leave CDATA as is?

PS. FYI, adding same parser to etree.XSLT doesn't change outcome

This doesn't seem to be related to lxml. It's my lack of knowledge...

CDATA in XSLT should be handled with "cdata-section-elements" attribute in output declaration. For example, if description element in XML file contains CDATA:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" cdata-section-elements='description' />
...

As far as XSLT is concerned, CDATA sections in XML are just noise. XSLT treats <![CDATA["]]> the same as &quot; which it treats the same as " ; they are different ways for the document author to write the same thing.

If you are using CDATA sections in your input to convey information, that is if <![CDATA[xxx]]> means something different from xxx , then you need to change your XML design.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM