[英]Validation using JAXB and Stax to marshal XML document
I have created an XML schema (foo.xsd) and used xjc
to create my binding classes for JAXB.我创建了一个 XML 模式 (foo.xsd) 并使用
xjc
为 JAXB 创建我的绑定类。 Let's say the root element is collection
and I am writing N document
objects, which are complex types.假设根元素是
collection
,我正在编写 N 个document
对象,它们是复杂类型。
Because I plan to write out large XML files, I am using Stax to write out the collection
root element, and JAXB to marshal document subtrees using Marshaller.marshal(JAXBElement, XMLEventWriter)
.因为我打算写出大型 XML 文件,所以我使用 Stax 写出
collection
根元素,并使用 JAXB 使用Marshaller.marshal(JAXBElement, XMLEventWriter)
编组文档子树。 This is the approach recommended by jaxb's unofficial user's guide .这是jaxb 的非官方用户指南推荐的方法。
My question is, how can I validate the XML while it's being marshalled?我的问题是,如何在编组时验证 XML? If I bind a schema to the JAXB marshaller (using
Marshaller.setSchema()
), I get validation errors because I am only marshalling a subtree (it's complaining that it's not seeing the collection
root element"). I suppose what I really want to do is bind a schema to the Stax XMLEventWriter
or something like that.如果我将模式绑定到 JAXB 编组器(使用
Marshaller.setSchema()
),我会收到验证错误,因为我只是编组了一个子树(它抱怨它没有看到collection
根元素”)。我想我真正想要的是做的是将架构绑定到Stax XMLEventWriter
或类似的东西。
Any comments on this overall approach would be helpful.对此总体方法的任何评论都会有所帮助。 Basically I want to be able to use
JAXB
to marshal and unmarshal large XML documents without running out of memory, so if there's a better way to do this let me know.基本上我希望能够使用
JAXB
来编组和解组大型 XML 文档而不会耗尽内存,所以如果有更好的方法来做到这一点,请告诉我。
Some Stax implementations seem to be able to validate output.一些 Stax 实现似乎能够验证输出。 See the following answer to a similar question:
请参阅以下对类似问题的回答:
You can make your root collection lazy and instantiate items only when the Marshaller calls Iterator.next()
.只有当 Marshaller 调用
Iterator.next()
时,您才能使根集合延迟并实例化项目。 Then a single call to marshal()
will produce a huge validated XML.然后对
marshal()
的一次调用将生成一个巨大的经过验证的 XML。 You won't run out of memory, because the beans that are already serialized get collected by GC.您不会耗尽内存,因为已经序列化的 bean 会被 GC 收集。
Also, it's OK to return null
as a collection element if it needs to be conditionally skipped.此外,如果需要有条件地跳过,可以将
null
作为集合元素返回。 There won't be NPE.不会有 NPE。
The XML schema validator itself seems to consume little memory even on huge XMLs.即使在巨大的 XML 上,XML 模式验证器本身似乎也消耗很少的内存。
See JAXB's ArrayElementProperty.serializeListBody()参见 JAXB 的ArrayElementProperty.serializeListBody()
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;
import java.util.AbstractList;
import java.util.ArrayList;
import java.util.List;
import javax.xml.XMLConstants;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBElement;
import javax.xml.bind.Marshaller;
import javax.xml.bind.SchemaOutputResolver;
import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlAnyElement;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.namespace.QName;
import javax.xml.transform.Result;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name = "TestHuge")
public class TestHuge {
static final boolean MISPLACE_HEADER = true;
private static final int LIST_SIZE = 20000;
static final String HEADER = "Header";
static final String DATA = "Data";
@XmlElement(name = HEADER)
String header;
@XmlElement(name = DATA)
List<String> data;
@XmlAnyElement
List<Object> content;
public static void main(final String[] args) throws Exception {
final JAXBContext jaxbContext = JAXBContext.newInstance(TestHuge.class);
final Schema schema = genSchema(jaxbContext);
final Marshaller marshaller = jaxbContext.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
marshaller.setSchema(schema);
final TestHuge instance = new TestHuge();
instance.content = new AbstractList<Object>() {
@Override
public Object get(final int index) {
return instance.createChild(index);
}
@Override
public int size() {
return LIST_SIZE;
}
};
// throws MarshalException ... Invalid content was found starting with element 'Header'
marshaller.marshal(instance, new Writer() {
@Override
public void write(final char[] cbuf, final int off, final int len) throws IOException {}
@Override
public void write(final int c) throws IOException {}
@Override
public void flush() throws IOException {}
@Override
public void close() throws IOException {}
});
}
private JAXBElement<String> createChild(final int index) {
if (index % 1000 == 0) {
System.out.println("serialized so far: " + index);
}
final String tag = index == getHeaderIndex(content) ? HEADER : DATA;
final String bigStr = new String(new char[1000000]);
return new JAXBElement<String>(new QName(tag), String.class, bigStr);
}
private static int getHeaderIndex(final List<?> list) {
return MISPLACE_HEADER ? list.size() - 1 : 0;
}
private static Schema genSchema(final JAXBContext jc) throws Exception {
final List<StringWriter> outs = new ArrayList<>();
jc.generateSchema(new SchemaOutputResolver() {
@Override
public Result createOutput(final String namespaceUri, final String suggestedFileName)
throws IOException {
final StringWriter out = new StringWriter();
outs.add(out);
final StreamResult streamResult = new StreamResult(out);
streamResult.setSystemId("");
return streamResult;
}
});
final StreamSource[] sources = new StreamSource[outs.size()];
for (int i = 0; i < outs.size(); i++) {
final StringWriter out = outs.get(i);
sources[i] = new StreamSource(new StringReader(out.toString()));
}
final SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
final Schema schema = sf.newSchema(sources);
return schema;
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.