Java CDATA提取xml

Question

For some reason someone changed the webService xml response that I needed. 由于某些原因，有人更改了我需要的webService xml响应。 So now, the imformation I need to fetch is inside a CDATA tag. 所以现在，我需要获取的信息位于CDATA标记内。
The thing is that all "<" and ">" characters have been replaced with "<" and ">". 问题是所有的“ <”和“>”字符都已替换为“ <”和“>”。

Example how it should look like: 示例其外观应为：

<MapAAAResult><!CDATA[<map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxbinkor4.png|vialcap:2</map>
    <nbr>234</nbr>
    <nbrProcess>97` ....

And this is how I am receiving it: 这就是我的接收方式：

    <MapAAAResult>
    &lt;mapa&gt;http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1&lt;/map&gt;
&lt;nbr&gt;234&lt;/nbr&gt;
&lt;nbrProcess&gt;97 .....

How can I do to get the information back to its original form? 如何使信息恢复为原始形式？ More exactly how can I transform that information back to an xml? 更确切地说，如何将这些信息转换回xml？

Any ideas? 有任何想法吗？

Thanks!! 谢谢！！

Answer 1

Possibly related to the character escaping issue: 可能与转义字符有关：

HTML inside XML CDATA being converted with < XML CDATA中的HTML用＆lt;转换。 and > 和＆gt; brackets 括号

The characters like "<" , ">", "&" are illegal in XML elements and escaping these can be done via CDATA or character replacement. XML元素中的字符“ <”，“>”，“＆”是非法的，可以通过CDATA或字符替换来转义这些字符。 Looks like the webService switched up their schema somewhere along the way. 好像webService在途中某处切换了其架构。

I've encountered a similar issue where I had to parse an escaped xml. 我遇到了类似的问题，我必须解析转义的xml。 A quick solution to get back the xml is to use replaceAll(): 取回xml的快速解决方案是使用replaceAll（）：

String data = "<MapAAAResult>"
            + "&lt;map&gt;http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1&lt;/map&gt;&lt;nbr&gt;234&lt;/nbr&gt;"
            + "&lt;nbrProcess&gt;97";
data = data.replaceAll("&lt;","<");
data = data.replaceAll("&gt;", ">");
data = data.replaceAll("&amp;","&");
System.out.println(data);

you will get back: 您会回来：

<MapAAAResult><map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1</map><nbr>234</nbr><nbrProcess>97...

It can get more complex with embedded CDATA tags within the first CDATA field, and xml parsing could get confused with the ending "]]>" such as: 在第一个CDATA字段中嵌入CDATA标记会变得更加复杂，并且XML解析可能会与结尾的“]]>”混淆，例如：

<xml><![CDATA[ <tag><![CDATA[data]]></tag> ]]></xml>

Thus, escaping the embedded data by using the < > & 因此，通过使用< > &转义嵌入数据< > & < > & is more resilient but can introduce unnecessary processing. 更具弹性，但会引入不必要的处理。 Also note: some parsers or xml readers can recognize the escaped characters. 另请注意：某些解析器或xml阅读器可以识别转义的字符。

Some other related threads: 其他一些相关线程：

XSL unescape HTML inside CDATA CDATA中的XSL unescape HTML

When to CDATA vs. Escape & Vice Versa? 什么时候去CDATA对抗Escape和Vice Versa？

Java CDATA提取xml

问题描述

1 个解决方案

解决方案1
0 2014-08-06 14:18:30

Java CDATA提取xml

问题描述

1 个解决方案

解决方案1 0 2014-08-06 14:18:30

解决方案1
0 2014-08-06 14:18:30