简体繁体中英

Voice Recognition (with ML?), not Speech Recognition

原文 2019-01-13 17:12:36 8 3 tensorflow/ machine-learning/ speech-recognition/ voice-recognition/ voice

I'm looking for a sample code for a voice recognition (not to be confused with speech recognition), that is - I need to build a model which can detect a certain person's voice.

I will probably end up trying to tweak the Tensor Flow "Simple Audio Recognition" with my own data...is this the best course of action? Any other suggestions?

3 answers

A lot depends on the specific scenario. How many training samples will you have? How many people do you intend to recognise? What's the signal to noise ratio? How much time would the system have to identify person? How strict should it be?

Still, I can already tell you that starting with neural networks is a poor course of action, as you immediately forsake understanding of the domain. Troubleshooting of misbehaving neural network is far more cumbersome than majority of other learning systems.

I would recommend building your own features rather than relying on ANN from start. I will assume for the moment that you're OK with Python (as majority of TF users) and propose modules like:

As a one way to take, you could compute with any of the three MFCC and build baseline system on these. Typically per window you compute 40 coefficients or more, and these can be visualised as spectrograms. The latter can be interpreted as images and, if you feel like it, you can use deep learning as on them (it's a popular choice).

Mind that "speaker recognition" is a whole field in biometric identification and there are plethora of papers that discuss good approaches.

Speaker recognition has its own specific compared to speech recognition. I would recommend you to start with some dedicated toolkits.

SPEAR is such a project, supplied with ready-to-use examples.

There is also ALIZE , but it is a bit old and more complicated in use, from my point of view.

HTK is a speech recognition software, but can be used for your task as well: htk-speaker-recognition . There is even a master thesis published on this: Speaker Recognition System Using HTK .

I was building a simple speaker recognition system and found indeed that a very simple GMM-UBM model built with HTK was giving the best results.

Update:

I completely forgot about SIDEKIT . It is a cool toolkit, successor of ALIZE. I also have some working example for it: https://www.dropbox.com/sh/iwbog5oiqhi2wo3/AACnj1Uhazqb-LQY_ztX66PDa?dl=0

For modern NN implementation which is relatively easy to use you can try

https://github.com/mravanelli/SincNet

You can train it on a public voxceleb database to get best separation.

Recording audio for speech recognition

Tensorflow Speech Recognition

MLP for speech recognition

Different model on speech recognition

How to train an lstm for speech recognition

Speech Recognition - how to split a sentence into words?

Understanding CTC loss for speech recognition in Keras

Predict speech recognition model with an sample input

How to mask paddings in LSTM model for speech emotion recognition

Logging and deque operation problems in Tensorflow Android Speech Recognition Sample

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Recording audio for speech recognition Tensorflow Speech Recognition MLP for speech recognition Different model on speech recognition How to train an lstm for speech recognition Speech Recognition - how to split a sentence into words? Understanding CTC loss for speech recognition in Keras Predict speech recognition model with an sample input How to mask paddings in LSTM model for speech emotion recognition Logging and deque operation problems in Tensorflow Android Speech Recognition Sample

Related Tags

Voice Recognition (with ML?), not Speech Recognition

Question

3 answers

solution1
1 2019-01-13 22:35:32

solution2
1 2019-01-16 12:39:24

solution3
0 2019-01-17 23:08:02

Voice Recognition (with ML?), not Speech Recognition

Question

3 answers

solution1 1 2019-01-13 22:35:32

solution2 1 2019-01-16 12:39:24

solution3 0 2019-01-17 23:08:02

solution1
1 2019-01-13 22:35:32

solution2
1 2019-01-16 12:39:24

solution3
0 2019-01-17 23:08:02