简体   繁体   中英

Parsing an a false xml using jaxb

I have a situation where the xml(But its not really a xml data, instead a tag based custom data format) is send from a third party server(Because of that I cant change the format and coordinating with the third party is pretty difficult. The markup looks like as follows

    <?xml version="1.0" encoding="UTF-8"?>
    <result>SUCCESS</result>
    <req>
      <?xml version="1.0" encoding="UTF-8"?>
      <Secure>
       <Message id="dfgdfdkjfghldkjfgh88934589345">
         <VEReq>
            <version>1.0.2</version><pan>3453243453453</pan>
            <Merchant><acqBIN>433274</acqBIN>
            <merID>3453453245</merID>
            <password>342534534</password>
            </Merchant>
            <Browser></Browser>
         </VEReq>
      </Message>
     </Secure>
    </req>

    <id>1906547421350020</id>
    <trackid>f68fb35c-cbc2-468b-aaf8-7b3f399b709d</trackid>
    <ci>6</ci>

Now here I want only result, req, id, trackid and ci tags value as the parse output. Means after parsing I need req to contain all contents inside tags. One more point here is the req tag is embedd with another xml as it is not as a CDATA. I cant parse it using JAXB.

Can somebody have library that can parse all the content if I can configure the avialable tags in a file, or any other way. I really dont want to convert them to an object, even a hashmap with tag as a key and content as value is also fine. But I prefer the POJO model(Generating a class from this kind of xml).

Let me know if somebody can help me.

Make it well-formed XML first and the pass to whatever tool you find suitable. JAXB is not bad as it will ignore elements it does not know (apart from the root element).

And since most (if not all) tools expect well-formed XML anyway, you'll have to take care of turning your "false" XML into "true" XML first. I'd first try something like JTidy or JSoup ans see if they help to make your non-well-formed XML well-formed.

If it does not work I'd try to hack it on the lower-level SAX or StAX parsing. The XML you posted seems to suffer from two problems: no single root element and XML declaration in the body. I think both problems can be addressed with some minimal parser hacking.

And I think there is a special place in hell for people who invent this type non-wellformed XML. Damned to sit there and correct all the HTML documents on the Internet into valid XHTML by hand.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM