简体   繁体   中英

Which xml parse impl to parse only a part of the XML and store it in DB

I've been searching the web but I didn't find anything that meet my requirements and am not sure what to do. I know this has been asked several times but not exactly the same as this.

We have some large XML files (still don't know size but I guess surely less than 1GB). We only need a part of this files (only a part of the XSD is useful for us), that we must read and then store in DB. In the future we'll probably need to recreate XML files, but this is not covered in this first phase.

Well, I've already seen that for something like this is better to use JAXB, but I'm a bit confused with JAXB implementations. We have JDK implementation, and Castor, and Metro, and EclipseLink Moxy, and I think I've seen at least 2 more implementations. Wich one would be the best to bind this XML to POJO classes and then to persist to DB with JPA? Is there a better implementation than the ones I've listed? Any of the ones I've listed is out-of-date? (I ask this because many pages I've been visiting are quite old and am not sure if there has been changes in the past years)

Performance is important, of course, but the important thing is that we only need part of the elements included in the XML. BTW, this is for use with SG1-XML standard.

Thanks in advance.

Note: I'm the EclipseLink JAXB (MOXy) lead and a member of the JAXB (JSR-222) expert group.

JAXB (JSR-222) is the Java standard for XML binding it is leveraged by other standards such as JAX-WS (SOAP Web Services) and JAX-RS (RESTful Web Services).

  • Project JAXB (part of Metro) is the reference implementation, and the version of the JAXB included in most implementationjs of the JDK/JRE is derived from it.
  • EclipseLink MOXy is a JAXB compliant implementation passing all the necessary compliance tests. It offers useful extensions such as path based mapping and additional support for mapping JPA entities (EclipseLink also provides a JPA implementation).
  • Castor - Castor appears to offer atleast a partial JAXB implementation (see: http://docs.codehaus.org/display/CASTOR/Castor+JAXB ). In general I would recommend staying away from anything that only implements part of a specification.

Since the document is large and you only need a portion of it, I would recommend using a JAXB implementation in combination with a StAX parser. You can use an XMLStreamReader to advance to the portion of the document you wish to unmarshal, and only unmarshal the chunk you need.


Which one would be the best to bind this XML to POJO classes and then to persist to DB with JPA?

As MOXy is a component of EclipseLink which is the JPA reference implementation we spend a significant amount of effort on those use cases. I'm the MOXy lead and I share a cubicle wall with Mike Keith the former JPA co-spec lead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM