简体   繁体   English

如何针对xsd架构验证大xml?

[英]How to validate big xml against xsd schema?

I need to validate big xml with limited memory usage. 我需要使用有限的内存使用来验证大xml。 With every code i've found so far i get out of memory error. 到目前为止我找到的每个代码都会出现内存错误。

Methods i tried: 方法我试过:

 //method 1
        SAXParserFactory factory = SAXParserFactory.newInstance();
        factory.setValidating(false);
        factory.setNamespaceAware(true);

        SchemaFactory schemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
        factory.setSchema(schemaFactory.newSchema(new Source[] {new StreamSource(Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd").getFile())}));
        SAXParser parser = factory.newSAXParser();
        XMLReader reader = parser.getXMLReader();
        reader.setErrorHandler(new SimpleErrorHandler());
        reader.parse(new InputSource(inputXml));
//method2 

XMLValidationSchemaFactory sf = XMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA);
            XMLValidationSchema vs = sf.createSchema(Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd"));
            XMLStreamReader2 sr = (XMLStreamReader2) XMLInputFactory2.newInstance().createXMLStreamReader(new FileInputStream(inputXml));
            sr.validateAgainst(vs);
            try {
              while (sr.hasNext()) {
                sr.next();
              }
              System.out.println("Validated ok!");
            } catch (XMLValidationException ve) {
              System.err.println("Validation problem: "+ve);
              isValid = false;
            }
            sr.close();

//method 3 //方法3

      SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
          String fileName = Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd").getFile();

          Schema schema = factory.newSchema(new File(fileName));
          Validator validator = schema.newValidator();

          // create a source from a file
          StreamSource source = new StreamSource(new File(inputXml));

          // check input

            validator.validate(source);

i get OutOfMemory every time 我每次都得到OutOfMemory

EDIT 编辑

with XOM 与XOM

SAXParserFactory factory = SAXParserFactory.newInstance();
            factory.setValidating(false);
            factory.setNamespaceAware(true);

            SchemaFactory schemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
            factory.setSchema(schemaFactory.newSchema(new Source[] {new StreamSource(Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd").getFile())}));
            SAXParser parser = factory.newSAXParser();
            XMLReader reader = parser.getXMLReader();
            reader.setErrorHandler(new SimpleErrorHandler());

            Builder builder = new Builder(reader);
            builder.build(new FileInputStream(new File(inputXml)));

still memory usage is very high, for 15mb xml - 250mb of heap stacktrace: 仍然内存使用率很高,对于15mb xml - 250mb的堆栈跟踪:

Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
at java.lang.StringBuffer.append(StringBuffer.java:322)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.handleCharacters(XMLSchemaValidator.java:1574)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.characters(XMLSchemaValidator.java:789)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:441)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:835)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:568)
at nu.xom.Builder.build(Unknown Source)
at nu.xom.Builder.build(Unknown Source)

EDIT My xml has large base64 string 编辑我的xml有大的base64字符串

Look at this article on XML unmarshalling from Marco Tedone see here . 看看这篇关于Marco Tedone的XML解组的文章, 请看这里 Based on his conclusion I would recommend for low memory consumption STax: 基于他的结论,我建议低内存消耗STax:

    XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
    XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(fileInputStream);
    Validator validator = schema.newValidator();
    validator.validate(new StAXSource(xmlStreamReader));

It's possible that the memory is being used for the schema, not the source document. 内存可能用于架构,而不是源文档。 You haven't said anything about the schema. 您还没有说过架构。 Some can use very high amounts of memory, for example if you have large finite values of minOccurs or maxOccurs in your content model. 有些人可能会使用非常大量的内存,例如,如果内容模型中有大量有限值minOccurs或maxOccurs。 At what point does the out of memory exception occur? 在什么时候发生内存不足异常?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM