简体   繁体   中英

How to read huge file in Java, in chunks without being blocked?

Say you have a file of bigger size then you have memory to handle. You'd like to read the files n bytes in turns and not get blocked in the process

  • read a block
  • pass it to a thread
  • read another block
  • pass it to a thread

I tried different things with varying success, however blocking always seem to be the issue.

Please provide an example of a non-blocking way to gain access to, say byte[]

You can't.

You will always block while waiting for the disk to provide you with data. If you have a lot of work to do with each chunk of data, then using a second thread may help: that thread can perform CPU-intensive work on the data while the first thread is blocked waiting for the next read to complete.

But that doesn't sound like your situation.

Your best bet is to read data in as large a block as you possibly can (say, 1MB or more). This minimizes the time blocked in the kernel, and may result in less time waiting for the disk (if the blocks being read happen to be contiguous).


Here's teh codez

ExecutorService exec = Executors.newFixedThreadPool(1);

// use RandomAccessFile because it supports readFully()
RandomAccessFile in = new RandomAccessFile("myfile.dat", "r");
in.seek(0L);

while (in.getFilePointer() < in.length())
{
    int readSize = (int)Math.min(1000000, in.length() - in.getFilePointer());
    final byte[] data = new byte[readSize];
    in.readFully(data);
    exec.execute(new Runnable() 
    {
        public void run() 
        {
            // do something with data
        }
    });
}

It sounds like you are looking for Streams, buffering, or some combination of the two (BufferedInputStream anyone?).

Check this out: http://docs.oracle.com/javase/tutorial/essential/io/buffers.html

This is the standard way to deal with very large files. I apologize if this isn't what you were looking for, but hopefully it'll help get the juices flowing anyway.

Good luck!

If you have a program that does I/O and CPU computations, blocking is inevitable (somewhere in your program) if on average the amount of CPU time it takes to process a byte is less than the time to read a byte.

If you try to read a file and that requires a disk seek, the data might not arrive for 10 ms. A 2 GHz CPU could have done 20 M clock cycles of work in that time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM