简体繁体 English

如何使用带有 PyTorch 的 CNN 处理音频分类的输入数据？

[英]How to process input data for audio classification using CNN with PyTorch?

原文 2020-01-22 07:38:31 0 1 python/ machine-learning/ classification/ pytorch/ signal-processing

As an engineer student works towards DSP and ML fields, I am working on an audio classification project with inputs being short clips (4 sec.) of instruments like bass, keyboard, guitar, etc. ( NSynth Dataset by the Magenta team at Google ).作为一名在 DSP 和 ML 领域工作的工程师学生，我正在从事一个音频分类项目，输入是贝司、键盘、吉他等乐器的短片（4 秒）。（谷歌 Magenta 团队的 NSynth 数据集） .

The idea is to convert all the short clips (.wav files) to spectrograms or melspectrograms then apply a CNN to train the model.这个想法是将所有短片（.wav 文件）转换为频谱图或梅尔频谱图，然后应用 CNN 来训练模型。

However, my questions is since the entire dataset is large (approximately 23GB), I wonder if I should firstly convert all the audio files to images like PNG then apply CNN.但是，我的问题是由于整个数据集很大（大约 23GB），我想知道我是否应该首先将所有音频文件转换为 PNG 等图像，然后应用 CNN。 I feel like this can take a lot of time, and it will double the storage space for my input data as now it is audio + image (maybe up to 70GB).我觉得这可能需要很多时间，而且它会将我的输入数据的存储空间加倍，因为现在它是音频 + 图像（可能高达 70GB）。

Thus, I wonder if there is any workaround here that can speed the process.因此，我想知道这里是否有任何可以加快进程的解决方法。

Thanks in advance.提前致谢。

1 个解决方案

Preprocessing is totally worth it.预处理是完全值得的。 You will very likely end up, running multiple experiments before your network will work as you want it to and you don't want to waste time pre-processing the features every time, you want to change a few hyper-parameters.您很可能最终会在您的网络按您希望的方式工作之前运行多个实验，并且您不想每次都浪费时间预处理特征，您想更改一些超参数。

Rather than using PNG, I would rather save directly PyTorch tensors ( torch.save that uses Python's standard pickling protocols) or NumPy arrays ( numpy.savez saves serialized arrays into a zip file).而不是使用PNG，我宁愿直接保存PyTorch张量（ torch.save使用Python的标准酸洗协议）或与NumPy阵列（ numpy.savez节省序列化数组到一个zip文件）。 If you are concerned with disk space, you can consider numpy.save_compressed .如果你关心磁盘空间，你可以考虑numpy.save_compressed 。