简体   繁体   English

使用jaxb解析错误的xml

[英]Parsing an a false xml using jaxb

I have a situation where the xml(But its not really a xml data, instead a tag based custom data format) is send from a third party server(Because of that I cant change the format and coordinating with the third party is pretty difficult. The markup looks like as follows 我遇到这样一种情况,即从第三方服务器发送xml(但实际上不是xml数据,而是基于标签的自定义数据格式)(因为我无法更改格式,因此与第三方进行协调非常困难。标记如下所示

    <?xml version="1.0" encoding="UTF-8"?>
    <result>SUCCESS</result>
    <req>
      <?xml version="1.0" encoding="UTF-8"?>
      <Secure>
       <Message id="dfgdfdkjfghldkjfgh88934589345">
         <VEReq>
            <version>1.0.2</version><pan>3453243453453</pan>
            <Merchant><acqBIN>433274</acqBIN>
            <merID>3453453245</merID>
            <password>342534534</password>
            </Merchant>
            <Browser></Browser>
         </VEReq>
      </Message>
     </Secure>
    </req>

    <id>1906547421350020</id>
    <trackid>f68fb35c-cbc2-468b-aaf8-7b3f399b709d</trackid>
    <ci>6</ci>

Now here I want only result, req, id, trackid and ci tags value as the parse output. 现在在这里,我只需要结果,req,id,trackid和ci标签值作为解析输出。 Means after parsing I need req to contain all contents inside tags. 意味着在解析之后,我需要req来包含标记内的所有内容。 One more point here is the req tag is embedd with another xml as it is not as a CDATA. 还有一点是,req标记嵌入了另一个xml,因为它不是作为CDATA。 I cant parse it using JAXB. 我无法使用JAXB解析它。

Can somebody have library that can parse all the content if I can configure the avialable tags in a file, or any other way. 如果我可以在文件中或其他方式配置可访问的标签,那么有人可以拥有可以解析所有内容的库吗? I really dont want to convert them to an object, even a hashmap with tag as a key and content as value is also fine. 我真的不想将它们转换为对象,即使是将标签作为键并将内容作为值的哈希图也很好。 But I prefer the POJO model(Generating a class from this kind of xml). 但是我更喜欢POJO模型(从这种xml生成类)。

Let me know if somebody can help me. 让我知道是否有人可以帮助我。

Make it well-formed XML first and the pass to whatever tool you find suitable. 首先使其格式正确的XML,然后将其传递给您认为合适的任何工具。 JAXB is not bad as it will ignore elements it does not know (apart from the root element). JAXB不错,因为它将忽略它不知道的元素(除了根元素)。

And since most (if not all) tools expect well-formed XML anyway, you'll have to take care of turning your "false" XML into "true" XML first. 而且由于大多数(如果不是全部)工具仍然期望格式正确的XML,因此您必须首先将“ false” XML转换为“ true” XML。 I'd first try something like JTidy or JSoup ans see if they help to make your non-well-formed XML well-formed. 我首先尝试使用JTidy或JSoup之类的方法,看看它们是否有助于使格式不正确的XML格式正确。

If it does not work I'd try to hack it on the lower-level SAX or StAX parsing. 如果它不起作用,我将尝试在较低级别的SAX或StAX解析中对其进行破解。 The XML you posted seems to suffer from two problems: no single root element and XML declaration in the body. 您发布的XML似乎有两个问题:主体中没有单个根元素和XML声明。 I think both problems can be addressed with some minimal parser hacking. 我认为这两个问题都可以通过一些最小的解析器入侵来解决。

And I think there is a special place in hell for people who invent this type non-wellformed XML. 而且我认为对于发明这种类型的非格式XML的人来说,这是一个特殊的地方。 Damned to sit there and correct all the HTML documents on the Internet into valid XHTML by hand. 该死的坐在那里,用手将Internet上的所有HTML文档更正为有效的XHTML。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM