简体   繁体   中英

How to train CNN on common voice dataset

I am trying to train a cnn with the common voice dataset. I am new to speech recognition and am not able to find any links on how to use the dataset with keras. I followed this article to build a simple word classification network. But I want to scale it up with the common voice dataset. any help is appreciated.

Thank you

What you can do is looking at MFCCs . In short, these are features extracted from the audio waveform by using signal processing techniques to transcribe the way humans perceive sound. In python, you can use python-speech-features to compute MFCCs.

Once you have prepared your data, you can build a CNN; for example something like this one :

在此输入图像描述

You can also use RNNs (LSTM or GRU for example), but this is a bit more advanced.

EDIT: A very good dataset to start, if you want:

Speech Commands Dataset

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM