简体   繁体   中英

How to read large portions of a file without exhausting memory in Rust?

I'm trying to re-write a portion of the GNU coreutils 'split' tool, to split a file in multiple parts of approximately the same size.

A part of my program is reading large portions of a file just to write them into another. On the memory side I don't want to map these portions in memory because they can be anywhere from zero bytes long up to several gigabytes.

Here's an extract of the code I wrote using a BufReader:

let file = File::open("myfile.txt");
let mut buffer = Vec::new();
let mut reader = BufReader::new(&file); 
let mut handle = reader.take(length);  // here length can be 10 or 1Go !
let read = handle.read_to_end(&mut buffer);

I feel like I'm mapping the whole chunk of file in memory because of the read_to_end(&mut buffer) call. Am I? If not, does it mean the the BufReader is doing its job and can I just admit that it's doing some kind of magic (abstraction) allowing me to "read" an entire portion of a file without really mapping it into memory? Or am I misusing these concepts in my code?

Yes, you're reading the whole chunk into memory. You can inspect buffer to confirm. If it has length bytes then there you go; there are length bytes in memory. There's no way BufReader could fake that.

Yes, if we look into the source of the read_to_end function we can see that the buffer you give it will be extended to hold the new data as it comes in if the available space in the vector is exhausted.

And even just in the docs , rust tells us that is read everything until EOF into the buffer:

Read all bytes until EOF in this source, placing them into buf

You can also take a look at the code presented in this question as a starting point using a BufReader :

use std::{
    fs::File,
    io::{self, BufRead, BufReader},
};

fn main() -> io::Result<()> {
    const CAP: usize = 1024 * 128;
    let file = File::open("my.file")?;
    let mut reader = BufReader::with_capacity(CAP, file);

    loop {
        let length = {
            let buffer = reader.fill_buf()?;
            // do stuff with buffer here
            buffer.len()
        };
        if length == 0 {
            break;
        }
        reader.consume(length);
    }

    Ok(())
}

A better approach might be to set up an un-buffered Reader , and read bytes directly into the buffer while checking that you are not exceeding whatever byte or line bounds specified by the user, and writing the buffer contents to file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM