简体   繁体   中英

Java CDATA extract xml

For some reason someone changed the webService xml response that I needed. So now, the imformation I need to fetch is inside a CDATA tag.
The thing is that all "<" and ">" characters have been replaced with "<" and ">".

Example how it should look like:

<MapAAAResult><!CDATA[<map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxbinkor4.png|vialcap:2</map>
    <nbr>234</nbr>
    <nbrProcess>97` ....

And this is how I am receiving it:

    <MapAAAResult>
    &lt;mapa&gt;http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1&lt;/map&gt;
&lt;nbr&gt;234&lt;/nbr&gt;
&lt;nbrProcess&gt;97 .....

How can I do to get the information back to its original form? More exactly how can I transform that information back to an xml?

Any ideas?

Thanks!!

Possibly related to the character escaping issue:

HTML inside XML CDATA being converted with &lt; and &gt; brackets

The characters like "<" , ">", "&" are illegal in XML elements and escaping these can be done via CDATA or character replacement. Looks like the webService switched up their schema somewhere along the way.

I've encountered a similar issue where I had to parse an escaped xml. A quick solution to get back the xml is to use replaceAll():

String data = "<MapAAAResult>"
            + "&lt;map&gt;http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1&lt;/map&gt;&lt;nbr&gt;234&lt;/nbr&gt;"
            + "&lt;nbrProcess&gt;97";
data = data.replaceAll("&lt;","<");
data = data.replaceAll("&gt;", ">");
data = data.replaceAll("&amp;","&");
System.out.println(data);

you will get back:

<MapAAAResult><map>http://tstgis.xxxxxxx.xxx/gis_n/WebService1/Users/Image/xxxxxxxxbi542m4.png|vialcap:1</map><nbr>234</nbr><nbrProcess>97...

It can get more complex with embedded CDATA tags within the first CDATA field, and xml parsing could get confused with the ending "]]>" such as:

<xml><![CDATA[ <tag><![CDATA[data]]></tag> ]]></xml>

Thus, escaping the embedded data by using the &lt; &gt; &amp; &lt; &gt; &amp; is more resilient but can introduce unnecessary processing. Also note: some parsers or xml readers can recognize the escaped characters.

Some other related threads:

XSL unescape HTML inside CDATA

When to CDATA vs. Escape & Vice Versa?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM