I am learning machine learning and data analysis on wav
files. I know if I have wav
files directly I can do something like this to read in the data
import librosa
mono, fs = librosa.load('./small_data/time_series_audio.wav', sr = 44100)
Now I'm given a gz-file "music_feature_extraction_test.tar.gz"
I'm not sure what to do now.
I tried:
with gzip.open('music_train.tar.gz', 'rb') as f:
for files in f :
mono, fs = librosa.load(files, sr = 44100)
but it gives me:
TypeError: lstat() argument 1 must be encoded string without null bytes, not str
Can anyone help me out?
There are several things going on:
tarfile
module , it can read gzip-compressed files directly. You'll get an iterator over it's members, each of which is an individual file. librosa
can't read from an in-memory buffer so you have to unpack the tar-members to temporary files. The tempfile
- module is your friend here, a NamedTemporaryFile
will provide you with a self-deleting file that you can uncompress to and provide to librosa
. You probably want to implement this as a simple generator function that takes the tarfile-name as it's input, iterates over it's members and yield
s what librosa.load()
provides you. That way everything gets cleaned up automatically.
The basic loop would therefore be
tarfile
-module. For each member NamedTemporaryFile
. Copy the content of the tarball-member to that file. You may want to use shutil.copyfileobj
to avoid reading the entire wav-file into memory before writing it to disk. NamedTemporaryFile
has a filename-attribute. Pass that to librosa.open
. yield
the return value of librosa.open
to the caller. You can use PySoundFile to read from the compressed file. https://pysoundfile.readthedocs.io/en/0.9.0/#virtual-io
import soundfile
with gzip.open('music_train.tar.gz', 'rb') as gz_f:
for file in gz_f :
fs, mono = soundfile.read(file, samplerate=44100)
Maybe you should also check if you need to resample the data before processing it with librosa: https://librosa.github.io/librosa/ioformats.html#read-specific-formats
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.