简体   繁体   中英

Parse XML with nested xml opening tags <?xml …?> in java

can you help me in parsing xml with nested <?xml version="1.0" encoding="utf-8"?> tags. when i am trying to parse this xml, im getting parsing error.

<?xml version="1.0" encoding="utf-8"?>      
<soap>
            <soapenvBody>
                <serviceResponse>
                    <?xml version="1.0" encoding="UTF-8"?>
                    <data>
                        <respCode>0</respCode>
                    </data>
                </serviceResponse>
            </soapenvBody>
        </soap>  

I don't think this is really a Java problem. Having a second XML declaration within the XML body is just illegal, so I don't think you'll be able to get any XML parsers to parse that. If you have control over the XML (it looks like you're generating it to store a response) then you could try wrapping the inner-XML document with CDATA :

<?xml version="1.0" encoding="utf-8"?>     
<soap>
    <soapenvBody>
        <serviceResponse>
          <![CDATA[
              <?xml version="1.0" encoding="UTF-8"?>
              <data>
                  <respCode>0</respCode>
              </data>
          ]]>
        </serviceResponse>
    </soapenvBody>
</soap>

EDIT:

I'm thinking that you most likely don't want the extra XML declaration inside that response at all. Do you have control over the code that creates the response? My guess is that the XML snippet <data>...</data> is created as a separate DOM object and then the string is spliced in the middle of the response. Writing out the entire XML document object results in the XML declaration being included, but if you just grab the document root node object ( <data> ) and write that out as a string then it probably won't include the extra XML declaration that's causing you all this trouble.

It occurred to me that a parser made for dealing with HTML might be able to do what you want. Since HTML tends to be a total mess compared to strict XML, HTML parsers are usually much more error-tolerant. A quick search turned up jsoup . I was able to pull the respCode from your sample XML above with roughly this code:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

String data = "your xml goes here";
Document doc = Jsoup.parse(data);
String respCodeRaw = doc.select("respCode").first().text();
int respCode = Integer.valueOf(respCodeRaw);

(I actually tested the library in the Clojure repl, but the code above should work!)

A tag that starts with like <? is a processing instruction. <?xml...> is an XML declaration, and can only be present at the beginning of the xml content. It's not allowed in the XML body.

Why does your soap body contain this? Do you have the option of removing it?

i did not find any parser in java to parse such embedded xml as it is not a valid xml and i guess almost all parses validate the xml before parsing it. so i choose the option to preprocess the xml and selected the inner xml then using SAX parser i parsed the xml and retrieved the values from xml. Guys thanks for your replies.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM