简体   繁体   English

有没有办法实现ZipFile的发送特征?

[英]Is there any way to implement the Send trait for ZipFile?

I want to read a .zip file in a different thread by using the zip crate . 我想通过使用zip crate在不同的线程中读取.zip文件。

extern crate zip;

use zip::ZipArchive;
use zip::read::ZipFile;
use std::fs::File;
use std::io::BufReader;
use std::thread;

fn compute_hashes(mut file: ZipFile) {
    let reader_thread= thread::spawn(move || {
        let mut reader = BufReader::new(file);
        /* ... */
    });
}

fn main() {
    let mut file = File::open(r"facebook-JakubOnderka.zip").unwrap();
    let mut zip = ZipArchive::new(file).unwrap();

    for i in 0..zip.len() {
        let mut inside = zip.by_index(i).unwrap();

        if !inside.name().ends_with("/") { // skip directories
            println!("Filename: {}", inside.name());
            compute_hashes(inside);
        }
    }
}

But the compiler shows me this error: 但是编译器向我显示了这个错误:

error[E0277]: the trait bound `std::io::Read: std::marker::Send` is not satisfied
  --> src/main.rs:10:24
   |
10 |     let reader_thread= thread::spawn(move || {
   |                        ^^^^^^^^^^^^^ `std::io::Read` cannot be sent between threads safely
   |
   = help: the trait `std::marker::Send` is not implemented for `std::io::Read`
   = note: required because of the requirements on the impl of `std::marker::Send` for `&mut std::io::Read`
   = note: required because it appears within the type `std::io::Take<&mut std::io::Read>`
   = note: required because it appears within the type `zip::crc32::Crc32Reader<std::io::Take<&mut std::io::Read>>`
   = note: required because it appears within the type `zip::read::ZipFileReader<'_>`
   = note: required because it appears within the type `zip::read::ZipFile<'_>`
   = note: required because it appears within the type `[closure@src/main.rs:10:38: 13:6 file:zip::read::ZipFile<'_>]`
   = note: required by `std::thread::spawn`

But the same works for the type std::fs::File . 但是同样适用于std::fs::File类型。 Is it necessary to fix the zip crate or is there any other method? 是否有必要修理zip箱或是否有其他方法?

This is a limitation of the zip crate's API and you can't really change anything. 这是zip crate API的限制,你无法真正改变任何东西。 The problem is that the file ZipArchive is created by calling new and passing a reader -- something that implements Read and Seek . 问题是ZipArchive文件是通过调用new并传递一个读者来创建的 - 实现了ReadSeek But these are the only requirements for the reader (in particular, the reader doesn't need to be Clone ). 但这些是读者的唯一要求(特别是,读者不需要Clone )。 Thus, the whole ZipArchive can only own one reader. 因此,整个ZipArchive只能拥有一个阅读器。

But now the ZipArchive is able to produce ZipFile s which implement Read themselves. 但现在ZipArchive能够生成自己实现Read ZipFile How does that work if the whole ZipArchive only has one reader? 如果整个ZipArchive只有一个阅读器,那怎么办? Through sharing! 通过分享! The only reader is shared between the archive and all files. 归档和所有文件之间共享唯一的读者。 But this sharing is not thread save! 但这个共享不是线程保存! A mutable reference to the reader is stored in each ZipFile -- this violates Rust's core principle. 读者的可变引用存储在每个ZipFile - 这违反了Rust的核心原则。

This is a known issue of the crate and is being discussed on the GitHub issue tracker . 这是一个已知的箱子问题,正在GitHub问题跟踪器上讨论


So what can you do now? 那你现在能做什么? Not a whole lot, but a few possibilities (as mentioned by the library author) might be OK for your use case: 不是很多,但是一些可能性(如图书馆作者所提到的)可能适用于您的用例:

  • You could decompress the whole file into memory first, then send the raw data to another thread to do calculations on it. 您可以先将整个文件解压缩到内存中,然后将原始数据发送到另一个线程以对其进行计算。 Something like: 就像是:

     let data = Vec::new(); BufReader::new(file).read_to_end(&mut data)?; let reader_thread= thread::spawn(move || { // Do stuff with `data` }); 

    But if you just want to compute a cheap hash function on all files, loading the contents into memory is probably slower than computing the hash on the fly and might be infeasible if your files are big. 但是如果你只是想在所有文件上计算一个廉价的哈希函数,那么将内容加载到内存中可能比在运行中计算哈希值要慢,如果文件很大,则可能不可行。

  • Creating one ZipArchive for each thread. 为每个线程创建一个ZipArchive This might be very slow if you have many small files in your archive... 如果您的档案中有许多小文件,这可能会非常慢......


A tiny hint: starting a thread costs time. 一个小小的暗示:开始一个线程需要时间。 You often don't want to start a thread for each unit of work, but rather maintain a fixed number of threads in a thread pool, manage work in a queue and assign work to idle worker threads. 您通常不希望为每个工作单元启动一个线程,而是在线程池中维护固定数量的线程,管理队列中的工作并将工作分配给空闲工作线程。 The threadpool crate might serve your needs. threadpool可能满足您的需求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM