简体   繁体   中英

How to process input data for audio classification using CNN with PyTorch?

As an engineer student works towards DSP and ML fields, I am working on an audio classification project with inputs being short clips (4 sec.) of instruments like bass, keyboard, guitar, etc. ( NSynth Dataset by the Magenta team at Google ).

The idea is to convert all the short clips (.wav files) to spectrograms or melspectrograms then apply a CNN to train the model.

However, my questions is since the entire dataset is large (approximately 23GB), I wonder if I should firstly convert all the audio files to images like PNG then apply CNN. I feel like this can take a lot of time, and it will double the storage space for my input data as now it is audio + image (maybe up to 70GB).

Thus, I wonder if there is any workaround here that can speed the process.

Thanks in advance.

Preprocessing is totally worth it. You will very likely end up, running multiple experiments before your network will work as you want it to and you don't want to waste time pre-processing the features every time, you want to change a few hyper-parameters.

Rather than using PNG, I would rather save directly PyTorch tensors ( torch.save that uses Python's standard pickling protocols) or NumPy arrays ( numpy.savez saves serialized arrays into a zip file). If you are concerned with disk space, you can consider numpy.save_compressed .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM