简体   繁体   English

Java CDATA提取xml

[英]Java CDATA extract xml

For some reason someone changed the webService xml response that I needed. 由于某些原因,有人更改了我需要的webService xml响应。 So now, the imformation I need to fetch is inside a CDATA tag. 所以现在,我需要获取的信息位于CDATA标记内。
The thing is that all "<" and ">" characters have been replaced with "<" and ">". 问题是所有的“ <”和“>”字符都已替换为“ <”和“>”。

Example how it should look like: 示例其外观应为:

<MapAAAResult><!CDATA[<map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxbinkor4.png|vialcap:2</map>
    <nbr>234</nbr>
    <nbrProcess>97` ....

And this is how I am receiving it: 这就是我的接收方式:

    <MapAAAResult>
    &lt;mapa&gt;http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1&lt;/map&gt;
&lt;nbr&gt;234&lt;/nbr&gt;
&lt;nbrProcess&gt;97 .....

How can I do to get the information back to its original form? 如何使信息恢复为原始形式? More exactly how can I transform that information back to an xml? 更确切地说,如何将这些信息转换回xml?

Any ideas? 有任何想法吗?

Thanks!! 谢谢!!

Possibly related to the character escaping issue: 可能与转义字符有关:

HTML inside XML CDATA being converted with &lt; XML CDATA中的HTML用&lt;转换。 and &gt; 和&gt; brackets 括号

The characters like "<" , ">", "&" are illegal in XML elements and escaping these can be done via CDATA or character replacement. XML元素中的字符“ <”,“>”,“&”是非法的,可以通过CDATA或字符替换来转义这些字符。 Looks like the webService switched up their schema somewhere along the way. 好像webService在途中某处切换了其架构。

I've encountered a similar issue where I had to parse an escaped xml. 我遇到了类似的问题,我必须解析转义的xml。 A quick solution to get back the xml is to use replaceAll(): 取回xml的快速解决方案是使用replaceAll():

String data = "<MapAAAResult>"
            + "&lt;map&gt;http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1&lt;/map&gt;&lt;nbr&gt;234&lt;/nbr&gt;"
            + "&lt;nbrProcess&gt;97";
data = data.replaceAll("&lt;","<");
data = data.replaceAll("&gt;", ">");
data = data.replaceAll("&amp;","&");
System.out.println(data);

you will get back: 您会回来:

<MapAAAResult><map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1</map><nbr>234</nbr><nbrProcess>97...

It can get more complex with embedded CDATA tags within the first CDATA field, and xml parsing could get confused with the ending "]]>" such as: 在第一个CDATA字段中嵌入CDATA标记会变得更加复杂,并且XML解析可能会与结尾的“]]>”混淆,例如:

<xml><![CDATA[ <tag><![CDATA[data]]></tag> ]]></xml>

Thus, escaping the embedded data by using the &lt; &gt; &amp; 因此,通过使用&lt; &gt; &amp;转义嵌入数据&lt; &gt; &amp; &lt; &gt; &amp; is more resilient but can introduce unnecessary processing. 更具弹性,但会引入不必要的处理。 Also note: some parsers or xml readers can recognize the escaped characters. 另请注意:某些解析器或xml阅读器可以识别转义的字符。

Some other related threads: 其他一些相关线程:

XSL unescape HTML inside CDATA CDATA中的XSL unescape HTML

When to CDATA vs. Escape & Vice Versa? 什么时候去CDATA对抗Escape和Vice Versa?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM