I am trying to train a cnn with the common voice dataset. I am new to speech recognition and am not able to find any links on how to use the dataset with keras. I followed this article to build a simple word classification network. But I want to scale it up with the common voice dataset. any help is appreciated.
Thank you
What you can do is looking at MFCCs . In short, these are features extracted from the audio waveform by using signal processing techniques to transcribe the way humans perceive sound. In python, you can use python-speech-features to compute MFCCs.
Once you have prepared your data, you can build a CNN; for example something like this one :
You can also use RNNs (LSTM or GRU for example), but this is a bit more advanced.
EDIT: A very good dataset to start, if you want:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.