简体   繁体   English

如何从 tarfile 流式传输文件以进行读取?

[英]How to stream files from tarfile for reading?

I am trying to read wav files from a tarfile which is located in a bucket.我正在尝试从位于存储桶中的 tarfile 读取 wav 文件。 Since there are a lot of files I do not want to extract those files first.由于有很多文件,我不想先提取这些文件。

Instead, I would like to read the data from the tarfile and stream it to wavfile.read (from scipy.io )相反,我想从 tarfile 中读取数据并将其流式传输到wavfile.read (来自scipy.io

with tf.gfile.Open(chunk_fp, mode='rb') as f:
    with tarfile.open(fileobj=f, mode='r|*') as tar:
        for member in ds_text.index.values:
            bytes = BytesIO(tar.extractfile(member))  # Obviously not working
            rate, wav_data = wavfile.read(bytes)
            # Do stuff with data ..

However, I am not able to get my hands on a steam for wavfile.read to work on.但是,我无法让wavfile.read进行工作。

Trying different things gets me different errors:尝试不同的事情会给我带来不同的错误:

 tar.extractfile(member).seek(0)

{AttributeError}'_Stream' object has no attribute 'seekable'

 tar.extractfile(member).raw.read()

{StreamError}seeking backwards is not allowed

and so on.等等。

Any ideas how I can achieve this?我有什么想法可以实现这一目标吗?

It turns out that I just opened the file in the wrong mode.事实证明,我只是以错误的模式打开了文件。 Using r:* instead of r|* works:使用r:*而不是r|*工作:

with tarfile.open(fileobj=f, mode='r:*') as tar:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM