简体   繁体   中英

How do I read multiple CSV/Parquet/JSON etc files from a directory using Rust?

I am using polars with Rust and I would like to be able to read multiple csv files as input.

I found this section in the documentation that shows how to use glob patterns to read multiple files using Python, but I could not find a way to do this in Rust.

Trying the glob pattern with Rust does not work.

The code I tried was

use polars::prelude::*;

fn main() {

    let df = CsvReader::from_path("./example/*.csv").unwrap().finish().unwrap();

    println!("{:?}", df);
}

And this failed with the error

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Io(Os { code: 2, kind: NotFound, message: "No such file or directory" })', src/main.rs:26:54
stack backtrace:
   0: rust_begin_unwind

I also tried creating the Path independently and confirm the path represents a directory,

use std::path::PathBuf;
use polars::prelude::*;

fn main() {

    let path = PathBuf::from("./example");
    println!("{}", path.is_dir());
    let df = CsvReader::from_path(path).unwrap().finish().unwrap();

    println!("{:?}", df);
}

it also fails with the same error.

So question is how do I read multiple CSV/Parquet/JSON etc files from a directory using Rust?

The section of the documentation referenced in your question uses both the library glob and a for loop in python .

Thus, we can write the rust version implementing similar ideas as follows:

eager version

use std::path::PathBuf;

use glob::glob;
use polars::prelude::*;

fn main() {
    let csv_files = glob("my-file-path/*csv")
                      .expect("No CSV files in target directory");

    let mut dfs: Vec<PolarsResult<DataFrame>> = Vec::new();

    for entry in csv_files {
        dfs.push(read_csv(entry.unwrap().to_path_buf()));
   }

   println!("dfs: {:?}", dfs);

}

fn read_csv(filepath: PathBuf) -> PolarsResult<DataFrame> {
    CsvReader::from_path(filepath)?
        .has_header(true)
        .finish()
}

lazy version

fn read_csv_lazy(filepath: PathBuf) -> PolarsResult<LazyFrame> {
  LazyCsvReader::new(filepath).has_header(true).finish()
}

fn main() {
  
  let mut ldfs: Vec<PolarsResult<LazyFrame>> = Vec::new();

  for entry in csv_files {
    ldfs.push(read_csv_lazy(entry.unwrap().to_path_buf()));
  }

  // do stuff

  for f in ldfs.into_iter() {
      println!("{:?}", f.unwrap().collect())
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM