简体   繁体   English

在不立即将整个文件加载到 memory 的情况下,分块读取大文件的最有效方法是什么?

[英]What is the most efficient way to read a large file in chunks without loading the entire file in memory at once?

What is the most efficient general purpose way of reading "large" files (which may be text or binary), without going into unsafe territory?在不进入unsafe区域的情况下,读取“大”文件(可能是文本或二进制文件)的最有效通用方法是什么? I was surprised how few relevant results there were when I did a web search for "rust read large file in chunks".当我进行 web 搜索“rust read large file in chunks”时,我很惊讶相关结果很少。

For example, one of my use cases is to calculate an MD5 checksum for a file using rust-crypto (the Md5 module allows you to add &[u8] chunks iteratively).例如,我的一个用例是使用rust-crypto计算文件的 MD5 校验和( Md5模块允许您迭代地添加&[u8]块)。

Here is what I have, which seems to perform slightly better than some other methods like read_to_end :这是我所拥有的,它的性能似乎比read_to_end等其他方法略好:

use std::{
    fs::File,
    io::{self, BufRead, BufReader},
};

fn main() -> io::Result<()> {
    const CAP: usize = 1024 * 128;
    let file = File::open("my.file")?;
    let mut reader = BufReader::with_capacity(CAP, file);

    loop {
        let length = {
            let buffer = reader.fill_buf()?;
            // do stuff with buffer here
            buffer.len()
        };
        if length == 0 {
            break;
        }
        reader.consume(length);
    }

    Ok(())
}

I don't think you can write code more efficient than that. 我不认为你可以编写比这更高效的代码。 fill_buf on a BufReader over a File is basically just a straight call to read(2) . File上的fill_buf上的BufReader 基本上只是对read(2)的直接调用

That said, BufReader isn't really a useful abstraction when you use it like that; 也就是说,当你像这样使用时, BufReader并不是真正有用的抽象; it would probably be less awkward to just call file.read(&mut buf) directly. 直接调用file.read(&mut buf)可能不那么尴尬。

I did it this way, I don't know if it is wrong but it worked perfectly for me, still don't know if it is the correct way tho..我这样做了,我不知道它是否错了,但它对我来说非常有效,仍然不知道它是否是正确的方法......

use std::io;
use std::io::prelude::*;
use std::fs::File;

fn main() -> io::Result<()> 
{
    const FNAME: &str = "LargeFile.txt";
    const CHUNK_SIZE: usize = 1024; // bytes read by every loop iteration.
    let mut limit: usize = (1024 * 1024) * 15; // How much should be actually read from the file..
    let mut f = File::open(FNAME)?;
    let mut buffer = [0; CHUNK_SIZE]; // buffer to contain the bytes.

    // read up to 15mb as the limit suggests..
    loop {
        if limit > 0 {
            // Not finished reading, you can parse or process data.
            let _n = f.read(&mut buffer[..])?;

            for bytes_index in 0..buffer.len() {
               print!("{}", buffer[bytes_index] as char);
            }
            limit -= CHUNK_SIZE;
        } else {
            // Finished reading..
            break;
        }
    }    
    Ok(())
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从大文件中读取格式化数据的最有效方法是什么? - What is the most efficient way to read formatted data from a large file? 读取大型二进制文件python的最有效方法是什么 - What is the most efficient way to read a large binary file python 在 Java 中读取 3GB 的非常大的 csv 文件的内存有效方法是什么? - What is the memory efficient way to read a very large csv file of say 3GB in Java? 从文件读取字符的最有效方法? - Most efficient way to read characters from a file? 如何在不将整个文件加载到内存中的情况下读取/流式传输文件? - How can I read/stream a file without loading the entire file into memory? 在 Java 中读取大量日志文件并发布到 API 端点的最有效方法是什么? - What's the most efficient way to read in a massive log file, and post to an API endpoint in Java? 是否可以在不将文件加载到内存的情况下读取文件? - Is it possible to read a file without loading it into memory? 写入文件最有效的方法? - Most efficient way to write to a file? 读取大文本文件而不立即将其读入RAM - Read large text file without read it into RAM at once 在C中打印文件到stdout最有效的方法是什么? - What is the most efficient way to print a file in C to stdout?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM