简体   繁体   中英

How to efficiently read and write to files using minimal RAM

My aim is to read from a large file, process 2 lines at a time, and write the result to a new file(s). These files can get very large, from 1GB to 150GB in size, so I'd like to attempt to do this processing using the least RAM possible

The processing is very simple: The lines split by a tab delimited, certain elements are selected, and the new String is written to the new files.

So far I have attempted using BufferedReader to read the File and PrintWriter to output the lines to a file:

while((line1 = br.readLine()) != null){
        if(!line1.startsWith("@")){
            line2 = br.readLine();
            recordCount++;
            one.println(String.format("%s\n%s\n+\n%s",line1.split("\t")[0] + ".1", line1.split("\t")[9], line1.split("\t")[10]));
            two.println(String.format("%s\n%s\n+\n%s",line2.split("\t")[0] + ".2", line2.split("\t")[9], line2.split("\t")[10]));
        }
    }

I have also attempted to uses Java8 Streams to read and write from the file:

stream.forEach(line -> {
        if(!line.startsWith("@")) {
            try {
                if (counter.getAndIncrement() % 2 == 0)
                    Files.write(path1, String.format("%s\n%s\n+\n%s", line.split("\t")[0] + ".1", line.split("\t")[9], line.split("\t")[10]).getBytes(), StandardOpenOption.APPEND);

                else
                    Files.write(path2, String.format("%s\n%s\n+\n%s", line.split("\t")[0] + ".2", line.split("\t")[9], line.split("\t")[10]).getBytes(), StandardOpenOption.APPEND);

            }catch(IOException ioe){

            }
        }
    });

Finally, I have tried to use an InputStream and scanner to read the file and PrintWriter to output the lines:

inputStream = new FileInputStream(inputFile);
    sc = new Scanner(inputStream, "UTF-8");
    String line1, line2;

    PrintWriter one = new PrintWriter(new FileOutputStream(dotOne));
    PrintWriter two = new PrintWriter(new FileOutputStream(dotTwo));

    while(sc.hasNextLine()){
        line1 = sc.nextLine();
        if(!line1.startsWith("@")) {
            line2 = sc.nextLine();
            one.println(String.format("%s\n%s\n+\n%s",line1.split("\t")[0] + ".1", line1.split("\t")[9], line1.split("\t")[10]));
            two.println(String.format("%s\n%s\n+\n%s",line2.split("\t")[0] + ".2", line2.split("\t")[9], line2.split("\t")[10]));

        }
    }

The issue that I'm facing is that the program seems to be storing either the data to write, or the input file data into RAM.

All of the above methods do work, but use more RAM than I'd like them to.

Thanks in advance,

Sam

What you did not try is a MemoryMappedByteBuffer. The FileChannel.map might be usable for your purpose, not allocating in java memory.

Functioning code with a self made byte buffer would be:

try (FileInputStream fis = new FileInputStream(source);
        FileChannel fic = fis.getChannel();
        FileOutputStream fos = new FileOutputStream(target);
        FileChannel foc = fos.getChannel()) {
    ByteBuffer buffer = ByteBuffer.allocate(1024);
    while (true) {
        int nread = fic.read(buffer);
        if (nread == -1) {}
            break;
        }
        buffer.flip();
        foc.write(buffer);
        buffer.clear();
    }
}

Using fic.map to consecutively map regions into OS memory seems easy, but such more complex code I would need to test first.

When creating PrintWriter set autoFlush to true:

new PrintWriter(new FileOutputStream(dotOne), true)

This way the buffered data will be flushed with every println .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM