[英]Java CDATA extract xml
For some reason someone changed the webService xml response that I needed. 由于某些原因,有人更改了我需要的webService xml响应。 So now, the imformation I need to fetch is inside a CDATA tag.
所以现在,我需要获取的信息位于CDATA标记内。
The thing is that all "<" and ">" characters have been replaced with "<" and ">". 问题是所有的“ <”和“>”字符都已替换为“ <”和“>”。
Example how it should look like: 示例其外观应为:
<MapAAAResult><!CDATA[<map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxbinkor4.png|vialcap:2</map>
<nbr>234</nbr>
<nbrProcess>97` ....
And this is how I am receiving it: 这就是我的接收方式:
<MapAAAResult>
<mapa>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1</map>
<nbr>234</nbr>
<nbrProcess>97 .....
How can I do to get the information back to its original form? 如何使信息恢复为原始形式? More exactly how can I transform that information back to an xml?
更确切地说,如何将这些信息转换回xml?
Any ideas? 有任何想法吗?
Thanks!! 谢谢!!
Possibly related to the character escaping issue: 可能与转义字符有关:
HTML inside XML CDATA being converted with < XML CDATA中的HTML用&lt;转换。 and >
和&gt; brackets
括号
The characters like "<" , ">", "&" are illegal in XML elements and escaping these can be done via CDATA or character replacement. XML元素中的字符“ <”,“>”,“&”是非法的,可以通过CDATA或字符替换来转义这些字符。 Looks like the webService switched up their schema somewhere along the way.
好像webService在途中某处切换了其架构。
I've encountered a similar issue where I had to parse an escaped xml. 我遇到了类似的问题,我必须解析转义的xml。 A quick solution to get back the xml is to use replaceAll():
取回xml的快速解决方案是使用replaceAll():
String data = "<MapAAAResult>"
+ "<map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1</map><nbr>234</nbr>"
+ "<nbrProcess>97";
data = data.replaceAll("<","<");
data = data.replaceAll(">", ">");
data = data.replaceAll("&","&");
System.out.println(data);
you will get back: 您会回来:
<MapAAAResult><map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1</map><nbr>234</nbr><nbrProcess>97...
It can get more complex with embedded CDATA tags within the first CDATA field, and xml parsing could get confused with the ending "]]>" such as: 在第一个CDATA字段中嵌入CDATA标记会变得更加复杂,并且XML解析可能会与结尾的“]]>”混淆,例如:
<xml><![CDATA[ <tag><![CDATA[data]]></tag> ]]></xml>
Thus, escaping the embedded data by using the < > &
因此,通过使用
< > &
转义嵌入数据< > &
< > &
is more resilient but can introduce unnecessary processing. 更具弹性,但会引入不必要的处理。 Also note: some parsers or xml readers can recognize the escaped characters.
另请注意:某些解析器或xml阅读器可以识别转义的字符。
Some other related threads: 其他一些相关线程:
XSL unescape HTML inside CDATA CDATA中的XSL unescape HTML
When to CDATA vs. Escape & Vice Versa? 什么时候去CDATA对抗Escape和Vice Versa?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.