简体   繁体   中英

Obtain InputStream from XML element content

My servlet's doPost() receives an HttpServletRequest whose ServletInputStream sends me a large chunk of uuencoded data wrapped in XML. Eg, there is an element:

<filedata encoding="base64">largeChunkEncodedHere</filedata>

I need to decode the chunk and write it to a file. I would like to get an InputStream from the chunk, decode it as a stream using MimeUtility, and use that stream to write the file---I would prefer not to read this large chunk into memory.

The XML is flat; ie, there is not much nesting. My first idea is to use a SAX parser but I don't know how to do the hand-off to a stream to read just the chunk.

Thanks for your ideas.

Glenn

Edit 1: Note JB Nizet's pessimistic answer in this post .

Edit 2: I've answered my own question affirmatively below, and marked maximdim's answer below as correct, even though it doesn't quite answer the question, it did direct me to the StAX API and Woodstox.

One more suggestion wrt Woodstox: it can also decode that base64 encoded stuff from within, efficiently. To do that, you need to cast XMLStreamReader into XMLStreamReader2 (or TypedXMLStreamReader ), which is part of Stax2 extension API.

But with that, you get methods readElementAsBinary() and getElementAsBinary() which automatically handle Base64 decoding. XMLStreamWriter2 similarly has Base64-encoding methods for writing binary data.

You could use SAX filter or XPath to get only element(s) you're interested in. Once you have content of your element, pass it to MimeUtility.decode() and write stream to file.

I suggest you update your question with code sample and let us know what doesn't work.

Update:

Here is sample code using StaX2 parser (Woodstox). For some reason StaX parser included in JDK doesn't seems to have comparable getText() method, at least at quick glance.

Obviously input (r) and output (w) could be any Reader/Writer or Stream - using String just for example here.

    Reader r = new StringReader("<foo><filedata encoding=\"base64\">largeChunkEncodedHere</filedata></foo>");
    Writer w = new StringWriter();

    XMLInputFactory2 xmlif = (XMLInputFactory2)XMLInputFactory2.newInstance();
    XMLStreamReader2 sr = (XMLStreamReader2)xmlif.createXMLStreamReader(r);

    boolean flag = false;
    while (sr.hasNext()) {
        sr.next();
        if (sr.getEventType() == XMLStreamConstants.START_ELEMENT) {
            if ("filedata".equals(sr.getLocalName())) {
                flag = true;
            }
        }
        else if (sr.getEventType() == XMLStreamConstants.CHARACTERS) {
            if (flag) {
                sr.getText(w, false);
                break;
            }
        }
    }
    System.out.println(w);

Here are some details on how streaming from an element while parsing with StAX is possible, using the Woodstox framework.

There is a good overview in this article .

From XMLInputFactory we can call createXMLStreamReader(java.io.InputStream stream) using the ServletInputStream. This returns an XMLStreamReader2, which has a getText(Writer w, boolean preserveContents) method that returns an int for the number of bytes written. This method must be implemented. In the implementation Stax2ReaderImpl there is this implementation

// // // StAX2, Pass-through text accessors
public int getText(Writer w, boolean preserveContents)
    throws IOException, XMLStreamException
{
    char[] cbuf = getTextCharacters();
    int start = getTextStart();
    int len = getTextLength();

    if (len > 0) {
        w.write(cbuf, start, len);
    }
    return len;
}

In this code we will need to change the getTextCharacters() method so that it reads from the InputStream. In the Woodstox tests TestGetSegmentedText testSegmentedGetCharacters() method we see a sr.getTextCharacters(offset, buf, start, len) method used. In fact the javadoc for the multiple argument XMLStreamReader.getTextCharacters() shows the following implementation.

int length = 1024;
char[] myBuffer = new char[ length ];
for ( int sourceStart = 0 ; ; sourceStart += length ) {
    int nCopied = stream.getTextCharacters( sourceStart, myBuffer, 0, length );
    if (nCopied < length) {
        break;
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM