简体   繁体   English

SAX:XML文档结构必须在同一实体内开始和结束

[英]SAX: XML document structures must start and end within the same entity

I'm trying to parse (fairly big) XML files using javax.xml.stream.XMLStreamReader . 我正在尝试使用javax.xml.stream.XMLStreamReader解析(相当大的)XML文件。 The files are well-formed (validated with xmllint), but still I get the following exception: 这些文件格式正确(已使用xmllint验证),但仍然出现以下异常:

javax.xml.stream.XMLStreamException: ParseError at [row,col]:[12418,95]
Message: XML document structures must start and end within the same entity.
at     com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:592)

This is a simplification of my code: 这是我的代码的简化:

while(parser.hasNext()){
    parser.next();
    if (parser.getEventType() == XMLStreamReader.START_ELEMENT){
        if (parser.getLocalName() == "s") {
            // do stuff
        }
    }
    if (parser.getEventType() == XMLStreamReader.END_ELEMENT){
        if (parser.getLocalName() == "s") {
            // do more stuff                
        }
    }
    if (parser.getEventType() == XMLStreamReader.CHARACTERS){
        if (inSentenceElement) {
            // process text
            parser.getText()...
        }
    }
}

I've checked the row/col in the XML as given in the error message, with nothing striking me as unusual. 我已经按照错误消息中的指示检查了XML中的行/列,没有发现任何异常。 I've been thinking that the size of the files might be a problem and that they get truncated so that an EOF is read before the root element is closed. 我一直在考虑文件的大小可能是个问题,它们会被截断,以便在关闭根元素之前先读取EOF。 Is that feasible and if yes, how can I avoid that? 那可行吗?如果可以,我该如何避免呢?

Edit: the bz2-zipped files are up to 1.5G in size with up to 7M lines, but also relatively small files at 4M crash after around 10K lines (although the number of lines after which the problem occurs tends to vary by some 3K lines. 编辑:bz2压缩文件的大小最大为1.5G,最多7M行,但是大约10K行后,在4M崩溃时文件也较小(尽管发生问题的行数往往因3K行而异) 。

Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,4207737]
Message: Attribute name "i" associated with an element type "someElement" must be followed by the ' = ' character.
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:598)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.bridge(StAXStreamConnector.java:181)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:355)
    ... 49 more

The attribute in the actual XML is: index="1", so it's valid, but it's being truncated or something. 实际XML中的属性为:index =“ 1”,因此它是有效的,但已被截断或其他内容。 The same code and XML worked with Java 1.7.0u51, but fails with the above exception with 1.7.0u71. 相同的代码和XML适用于Java 1.7.0u51,但是由于上述1.7.0u71异常而失败。 Location is always at the same column (CharacterOffset = 4207736) with that file. 位置始终与该文件位于同一列(CharacterOffset = 4207736)。 I'm using JAXB, which calls this during unmarshalling, but nothing has changed other than Java versions. 我正在使用JAXB,它在解组期间调用此方法,但是除Java版本外没有任何改变。

I would recommend checking some of the new XML limits recently added to reduce the denial of service attacks, it did work for my case. 我建议检查最近添加的一些新XML限制,以减少拒绝服务攻击,它确实适合我的情况。 https://docs.oracle.com/javase/tutorial/jaxp/limits/using.html https://docs.oracle.com/javase/tutorial/jaxp/limits/using.html

Specifically, adding the following to the command line running disables all of them. 具体来说,将以下内容添加到正在运行的命令行会禁用所有这些。 I would STRONGLY recommend finding better limits (or the specific one that causes your problem) instead of turning them all off with 0. 强烈建议您找到更好的限制(或引起您问题的特定限制),而不是将其全部关闭为0。

java -Djdk.xml.entityExpansionLimit=0 -Djdk.xml.elementAttributeLimit=0 -Djdk.xml.maxOccurLimit=0 -Djdk.xml.totalEntitySizeLimit=0 -Djdk.xml.maxGeneralEntitySizeLimit=0 -Djdk.xml.maxParameterEntitySizeLimit=0 -Djdk.xml.maxElementDepth=0    -jar myJarfile.jar

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 XML 文档结构必须在同一实体内开始和结束吗? - XML document structures must start and end within the same entity? SAXException = XML文档结构必须在同一实体内开始和结束 - SAXException=XML document structures must start and end within the same entity SAXParseException XML文档结构必须在同一实体中开始和结束 - SAXParseException XML document structures must start and end within the same entity XML 文档结构必须在同一实体内开始和结束 - XML document structures must start and end within the same entity SAXParseException:XML文档结构必须在同一实体中开始和结束 - SAXParseException: XML document structures must start and end within the same entity XML错误:“ XML文档结构必须在同一实体内开始和结束。” - XML Error: “XML document structures must start and end within the same entity.” XML文档结构必须在同一实体内开始和结束。 使用xslt将xml转换为html - XML document structures must start and end within the same entity. xml to html using xslt javax.xml.bind.UnmarshalException:XML文档结构必须在同一实体内开始和结束 - javax.xml.bind.UnmarshalException: XML document structures must start and end within the same entity XML 文档结构必须在同一个实体中开始和结束 android studio Android Studio - XML document structures must start and end within the same entity android studio Android Studio 在 netbeans 中清理和构建 javafx 应用程序时出错,XML 文档结构必须在同一实体中开始和结束 - error while cleaning and building javafx app in netbeans, XML document structures must start and end within the same entity
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM