简体   繁体   中英

Resume read of huge text file in Java

I am reading a huge text file of words (one word per line) but I have to stop it from time to time to resume the read the next day. Right now I'm using Apache's lineiterator but it's totally the wrong solution. My file is 7Gb and I had to interrupt reading it around at 1Gb. To resume the read I saved the number of line already read. This means that I have an if statement on the while loop. Apache's FileUtils doesn't allow to seek so that was my solution.

What is the best/fastest solution? I thought to use RandomAccessfile to get to the right line and continue reading, but I'm not sure if I can go to the right place AND how do I save the correct place I read last. I can reead again a couple of lines, so the precision is not so important, but I haven't found a way to get the pointer. I have a BufferedReader to read the File and a RandomAccessFile to seek to the right place, but I don't know how to periodically save a position with the BufferedReader. Any hints?

Code: (note the "SOMETHING" where I should print the value I can use on the seekToByte )

try {

        RandomAccessFile rand = new RandomAccessFile(file,"r");
        rand.seek(seekToByte);
        startAtByte = rand.getFilePointer();
        rand.close();

    } catch(IOException e) {
        // do something
    }

    // Do it using the BufferedReader 
    BufferedReader reader = null;
    FileReader freader = null;
    try {
        freader = new FileReader(file);
        reader = new BufferedReader(freader);
        reader.skip(startAtByte);

        long i=0;
        for(String line; (line = reader.readLine()) != null; ) {

            lines.add(line);
            System.out.print(i+" ");
            if (lines.size()>1000) {
                commit(lines);
                System.out.println("");
                lines.clear();
                System.out.println(SOMETHING?);
            }
        }

    } catch(Exception e) {
        // handle this           
    } finally {
        if (reader != null) {
            try {reader.close();} catch(Exception ignore) {}
        }
    }

RandomAccessfile is indeed one way to go. Use

long position = file.getFilePointer();

When you stop reading to save where you are in the file, and then restore with:

file.seek(position);

To resume reading at the same place.

However, be careful when using RandomAccessfile , as its readLine method does not completely support Unicode.

您能以某种方式使用预定的偏移量吗,例如将文件切成四个部分(偏移量0,偏移量1)(偏移量1,偏移量2)等,并使用RecursiveAction(ForkJoin API)来利用并行性。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM