简体   繁体   中英

XMLStreamReader Problem

I'm using the XMLStreamReader interface from javax.xml to parse an XML file. The file contains huge data amounts and single text nodes of several KB.

The validating and reading generally works very good, but I'm having trouble with text nodes that are larger than 15k characters. The problem occurs in this function

String foo = "";
if (xsr.getEventType() == XMLStreamConstants.CHARACTERS) {
    foo = xsr.getText();
    xsr.next(); // read next tag
}
return foo;

xsr being the stream reader. The text in the text node is 53'337 characters long in this particular case (but varies), however the xsr.getText() method only returns the first 15'537 of them. Of course I could loop over the function and concatenate the strings, but somehow I don't think that's the idea...

I did not find anything in the documentation or anywhere else about this. Is it intended behavior or can someone confirm/deny it? Am I using it the wrong way somehow?

Thanks

Of course I could loop over the function and concatenate the strings, but somehow I don't think that's the idea...

Actually, that is the idea :)

The parser is permitted to break up the event stream however it wishes, as long as it's consistent with the original document. That means it can, and often will, break up your text data into multiple events. How and when it chooses to do so is an implementation detail internal to the parser, and is essentially unpredictable.

So yes, if you receive multiple sequential CHARACTERS events, you need to append them manually. This is the price you pay for a low-level API.

Another option is the javax.xml.stream.isCoalescing option (documented in XMLStreamReader.next() or Using StAX ), which automatically concatenates long text into a single string. The following JUint3 test passes.

Warning : isCoalescing probably shouldn't be used in production because if the document has lots of character references (   ) or entity references ( < ), it will cause a StackOverflowError!

import java.io.ByteArrayInputStream;
import java.io.InputStream;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;

import junit.framework.TestCase;

public class XmlStreamTest extends TestCase {
    public void testLengthInXMlStreamReader() throws XMLStreamException {
        StringBuilder b = new StringBuilder();
        b.append("<root>");
        for (int i = 0; i < 65536; i++)
            b.append("hello\n");
        b.append("</root>");
        InputStream is = new ByteArrayInputStream(b.toString().getBytes());
        XMLInputFactory inputFactory = XMLInputFactory.newFactory();
        inputFactory.setProperty("javax.xml.stream.isCoalescing", true);
        XMLStreamReader reader = inputFactory.createXMLStreamReader(is);
        reader.nextTag();
        reader.next();
        assertEquals(6 * 65536, reader.getTextLength());
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM