简体   繁体   中英

Remove junk trailing xml from an inputstream

My free webhost appends analytics javascript to all PHP and HTML files. Which is fine, except that I want to send XML to my Android app, and it's invalidating my files.

Since XML is parsed in its entirety (and blows up) before passed along to my SAX ContentHandler, I can't just catch the exception and continue merrily along with a fleshed out object. (Which I tried, and then felt sheepish about.)

Any suggestions on a reasonably efficient strategy?

I'm about to create a class that will take my InputStream, read through it until I find the junk, break, then take what I just wrote to, convert it back into an InputStream and pass it along like nothing happened. But I'm worried that it'll be grossly inefficient, have bugs I shouldn't have to deal with (eg breaking on binary values such as embedded images) and hopefully unnecessary.

FWIW, this is part of an Android project, so I'm using the android.util.Xml class (see source code ). When I traced the exception, it took me to a native appendChars function that is itself being called from a network of private methods anyway, so subclassing anything seems to be unreasonably useless.

Here's the salient bit from my stacktrace:

E/AndroidRuntime(  678): Caused by: org.apache.harmony.xml.ExpatParser$ParseException: At line 3, column 0: junk after document element
E/AndroidRuntime(  678):    at org.apache.harmony.xml.ExpatParser.parseFragment(ExpatParser.java:523)
E/AndroidRuntime(  678):    at org.apache.harmony.xml.ExpatParser.parseDocument(ExpatParser.java:482)
E/AndroidRuntime(  678):    at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:320)
E/AndroidRuntime(  678):    at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:277)

I guess in the end I'm asking for opinions on whether the InputStream -> manually parse to OutputStream -> recreate InputStream -> pass along solution is as horrible as I think it is.

Free webhost have this issue. I'm still yet to find an alternative still in free mode.

"I'm about to create a class that will take my InputStream, read through it until I find the junk, break, then take what I just wrote to, convert it back into an InputStream and pass it along like nothing happened. But I'm worried that it'll be grossly inefficient, have bugs I shouldn't have to deal with (eg breaking on binary values such as embedded images) and hopefully unnecessary."

That'll work. You can read into a StringBuffer and then use a ByteArrayInputStream or something similar (like StreamReader if that's applicable).

http://developer.android.com/reference/java/io/ByteArrayInputStream.html

The downside is that you're reading in the entire XML file into memory, for large files, it can be inefficient memory-wise.

Alternatively, you can subclass InputStream and do the filtering out via the stream. You'd probably just need to override the 3 read() methods by calling super.read() and flagging when you've gotten to the garbage at the end and return an EOF as needed.

I'm about to create a class that will take my InputStream, read through it until I find the junk, break, then take what I just wrote to, convert it back into an InputStream and pass it along like nothing happened. But I'm worried that it'll be grossly inefficient, have bugs I shouldn't have to deal with (eg breaking on binary values such as embedded images) and hopefully unnecessary.

you could use a FilterStream for that no need for a buffer

best thing to do is add a delimiter to the end of the XML like --theXML ends HERE -- or a char not found in XML like a group of 16 \\u04\u003c/code> chars (you then only need to check every 16th byte) to the end of the XML and read until you find it

implementation assuming \\u04\u003c/code> delim

class WebStream extends FilterInputStream {

    byte[] buff = new byte[1024];
    int offset = 0, length = 0;

    public WebStream(InputStream i) {
        super(i);
    }

    @Override
    public boolean markSupported() {
        return false;
    }

    @Override
    public int read() throws IOException {
        if (offset == length)
            readNextChunk();
        if (length == -1)
            return -1;// eof
        return buff[offset++];
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        if (offset == length)
            readNextChunk();
        if (length == -1)
            return -1;// eof

        int cop = length - offset;
        if (len < cop)
            cop = len;
        System.arraycopy(buff, offset, b, off, cop);
        offset += cop;
        return cop;
    }

    private void readNextChunk() throws IOException {
        if (offset <= length) {
            System.arraycopy(buff, offset, buff, 0, length - offset);
            length -= offset;
            offset = 0;
        }
        int read = in.read(buff, length, buff.length - length);
        if (read < 0 && length <= 0) {
            length = -1;
            offset = 0;
            return;
        }

        // note that this is assuming ascii compatible
        // anything like utf16 or utf32 will break here
        for (int i = length; i < read + length; i += 16) {
            if (buff[i] == 0x04) {
                while (buff[--i] == 0x04)
                    ;// find beginning of delim block
                length = i;
                read = 0;
            }
        }
    }

}

note this misses throws, some error checking and needs proper debugging

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM