简体   繁体   English

具有cdata的转义xml,同时也具有转义数据值和标记

[英]escaped xml with cdata that has also has both escaped data value and tags

I am receiving xml data from a web service that returns all the data as one escaped xml string. 我正在从Web服务接收xml数据,该服务将所有数据作为一个转义的xml字符串返回。 however for whatever reason, part of the xml is enclosed within a cdata tag. 但是,无论出于何种原因,xml的一部分都包含在cdata标记内。 The escape xml within the cdata will often contain escaped xml character as well. cdata中的转义xml通常也将包含转义的xml字符。 example: 例:

<root>
  <importData>dat</importData>
  <Response>
   <![CDATA[&lt;SecondRoot&gt;
   &lt;Data&gt;123&lt;/Data&gt;
   &lt;DataEscapedCharacterIncluded&gt; 3 &gt; 1&lt;/DataEscapedCharacterIncluded&gt;
   &lt;/SecondRoot&gt;]]>
  &lt;/Response&gt;
&lt;/root&gt;

I need to transform both the xml inside and out of the cdata section into another xml format with xsl, but I'm having a hard time figuring out how to get this into a usable xml form with either c# or xsl so I can do the xsl transform into a different format. 我需要使用xsl将cdata部分内外的xml转换为另一种xml格式的xsl,但是我很难弄清楚如何使用c#或xsl将其转换成可用的xml格式,所以我可以xsl转换为其他格式。 I would like it look like below: 我希望它看起来像下面的样子:

  <root>
     <importData>dat</importData>
     <Response>
      <SecondRoot>
       <Data>123</Data>
       <DataEscapedCharacterIncluded> 3 &gt; 1</DataEscapedCharacterIncluded>
      </SecondRoot>
     </Response>
  <root>

The data you show may not be properly escaped. 您显示的数据可能无法正确转义。 If you unescape it, it may yield not well-formed XML. 如果不对其进行转义,则可能会生成格式不正确的XML。 Consider this line: 考虑这一行:

&lt;DataEscapedCharacterIncluded&gt; 3 &gt; 1&lt;/DataEscapedCharacterIncluded&gt;

If you unescape it, it will become this: 如果您取消转义,它将变为:

<DataEscapedCharacterIncluded> 3 > 1</DataEscapedCharacterIncluded>

This is still valid (a greater-than does not need to be escaped), but I assume that you'll also have &lt; 这仍然是有效的(不需要大于),但是我认为您也将&lt; in there somewhere, which must be escaped. 在某个地方, 必须逃脱。 If it is doubly escaped you should be fine. 如果它是双重逃脱的,那应该没问题。

To transform this there are several things you can do: 要对此进行转换,您可以执行以下几项操作:

  • With XSLT 1.0 or 2.0, transform it in two passes, one that does the unescaping with disable-output-escaping set to yes , and another one to do the actual transformation. 使用XSLT 1.0或2.0,请对其进行两次转换,一次通过将disable-output-escaping设置为yesdisable-output-escaping ,另一次进行实际的转换。
  • Use an extension function that takes a string and returns a node set. 使用扩展函数,该扩展函数接受字符串并返回节点集。
  • With XSLT 3.0, use the new function fn:parse-xml or fn:parse-xml-fragment , which can take XML-as-a-string as input. 在XSLT 3.0中,使用新函数fn:parse-xmlfn:parse-xml-fragment ,可以将XML as-a-string作为输入。
  • If your entire source is escaped, as it looks like, feed it unescaped to the XSLT processor as explained here . 如果您的整个源都已转义,则如此处所述, 将未转义的源提供给XSLT处理器 This will also take care of the escaped CDATA (but that part will remain escaped, see below). 这还将处理转义的CDATA(但该部分将保持转义,请参见下文)。

What is not entirely clear from your post is whether it is doubly escaped. 从您的帖子中还不能完全清楚的是它是否被双重删除了。 Ie, if your data looks like this: 即,如果您的数据如下所示:

<elem><![CDATA[<root>bla</root>]]></elem>

it is singly escaped. 它是唯一逃脱的。 If it looks like this: 如果看起来像这样:

<elem><![CDATA[&lt;root&gt;bla&lt;/root&gt;]]></elem>

it is doubly escaped. 它是双重逃脱的。 In the latter case, you will need to do an extra unescape cycle before you can process it. 在后一种情况下,您将需要执行一个额外的转义循环,然后才能对其进行处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM