简体   繁体   中英

Large file reading inconsistency in java

I'm kind of stuck in a problem for some days, and I have no idea what to do.

Purpose is to upload a file to Google Cloud Storage, but because it's a large file, and i want some efficiency, I use a thread to read it, and I pre slice the file in chunks of 2Mo. These chunks are stored in a little queue (about 2 to 5 slots) and can be access by my uploader class (the one who makes PUT requests)

But (because there's always one) the chunks are not consistent on every computer. I tried many things, BufferedInputStream, PushBackInputStream, FileChannel (with or without MappedByteBuffer) there's nothing to do, impacted computers fails somewhere during reading, and the last part (who is smaller than a normal chunk) is always larger than expected (so the total number of bytes read exceeds the original calculated file size).

I don't know why, but on some computers (significant number) it seem the file grow during reading. Did I miss something ? What am I doing wrong ? Can I truncate the remaining bytes ? But what to do if it's suddenly smaller than expected ? I'm out of ideas, so I request yours :)

Oh, little tricks, due to resume ability during upload, I must be able to get back in my reading, so it's reduce the number of class I can use (mark supported, or position in case of fileChannel).

If you have any advise on CPU and Memory optimisation, you're welcome too :) (There's not all of it, but the remaining it's just BlockingQueue implementation with q)

Here is a past of my Reader: http://paste.awesom.eu/Teraglehn/pw09&ln

And the interesting part :

public void run() {
    try {
        byte[] chunk = new byte[chunkSize];
        int read;
        int r;
        long skipped;
        while (!shouldStop && !finishReading && !stopped) {
            if(size()>=maxSize){
                continue;
            }
            read = 0;
            System.out.println("[available1] "+available);
            System.out.println("[available2] "+inputStream.available());
            if(pendingFix !=0){
                System.out.println(String.format("Fix of %d bytes asked", pendingFix));
                clear();
                if (pendingFix > 0 ) {
                    pendingFix = Math.min(pendingFix, (int) available);
                    skipped = inputStream.skip((long) pendingFix);
                    if(skipped != pendingFix){
                        throw new IOException(String.format("Ask fix of %d bytes has not been completely done (%d bytes actually skipped for unknown reason)", pendingFix, skipped));
                    }
                    incrementCursor(pendingFix);
                }else {
                    decrementCursor(Math.min(cursor, -pendingFix));
                    inputStream.reset();
                    skipped = inputStream.skip(cursor);
                    if(skipped != cursor){
                        throw new IOException(String.format("Ask fix of %d bytes has not been completely done (%d bytes actually back skipped for unknown reason)", pendingFix, cursor-skipped));
                    }
                }
                pendingFix = 0;
            }
            while(read < chunkSize){
                r = inputStream.read(chunk, read, chunkSize-read);
                if(r<0) {
                    read = (read > 0)? read : r;
                    break;
                }
                else {
                    read +=r;
                }
            }

            if(pendingFix!=0) continue;
            if(read != chunkSize){ // Probably end of file
                if(read == -1){
                    finishReading = true;
                }else if(available == read){
                    System.out.println("Partial chunk (end)");
                    incrementCursor(read);
                    put(Arrays.copyOfRange(chunk, 0, read));
                    finishReading = true;
                }else {
                    throw new IOException(String.format("Only %d bytes have been read on %d bytes asked for unknown reason, %d bytes available", read, chunkSize, available));
                }
            }else {
                System.out.println("Full chunk (running)");
                put(chunk.clone());
                incrementCursor(read);
            }
        }
    }catch(IOException e){
        this.interrupt();
        errors.add(e);
        e.printStackTrace();
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    stopped = true;
}

PS : There's something fun with all this, I upload session by session, and a session is a folder with one or more large file, it's always the last file which fails....

You've created a mess that works when all things happen the certain way, but if something doesn't do what you expect, it will fail. You're using available() which is most likely wrong or at least useless.

Your read loop is also wrong, since it's filling the chunk array, but assuming that each read fills it completely (if not, the previous bytes are overwritten).

Your // Probably end of file comment means that you have a logic problem. So I'd recommend writing out the logic in plain English, then rewriting the code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM