简体繁体中英

How to determine length of observation sequence for HMM in speech recognition

原文 2019-09-17 04:07:30 9 1 speech-recognition/ speech-to-text/ hidden-markov-models/ markov-chains/ viterbi

I'm re-learning how to use Hidden Markov Models for speech recognition and I have a question. It seems that most/all discussions of using HMM's consider the case of a known sequence of observation: [O1, O2, O3,...,OT] where T is a known number. However, if we were to try to use a trained HMM on speech in real time, or in a WAV file where someone was speaking one sentence after another, how exactly does one select the value of T? In other words, how does one know when the speaker has ended one sentence and started another? Does a practical HMM for speech recognition just use a fixed value for T and periodically recomputes the optimal state sequence up to the current observation using a fixed size window of length T into the past? Or is there some better way for dynamically selecting T at any instance of time?

1 answers

Does a practical HMM for speech recognition just use a fixed value for T and periodically recomputes the optimal state sequence up to the current observation using a fixed size window of length T into the past?

Viterbi decoding algorithm works frame by frame, so you just iterate over frames, you can iterate indefinitely until backtracking matrix fills all the memory.

Training algorithm considers audios that are prepared before training, usually 1-30 seconds. For training audio length is already known.

how does one know when the speaker has ended one sentence and started another?

There are different strategies here. Decoders search for the silence to wrap around decoding. Silence doesn't necessary mean the break between sentences, there could be no break between sentences at all. There could be break in the middle of a sentence too.

So to find silence decoder can use standalone voice activity detection algorithm and break when VAD detects silence or decoder can analyze backtrack information to decide if silence appeared. The second method is a bit more reliable.

Observation sequences format for HMM in speech recognition

HMM application in speech recognition

Speech Recognition - An HMM approach

speech recognition using HMM or MFCC

How to train HMM with audio senteces dataset for speech recognition?

No. of states for HMM acoustic models in Speech Recognition

Understanding variables from speech recognition paper in HMM-GMM

HMM vs Deep Learning for Speech Emotion Recognition (SER)

How to change the length of the unidentified speech signal during recognition?

How to feed variable-length of speech feature to RNN(LSTM) for Speech Recognition?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Observation sequences format for HMM in speech recognition HMM application in speech recognition Speech Recognition - An HMM approach speech recognition using HMM or MFCC How to train HMM with audio senteces dataset for speech recognition? No. of states for HMM acoustic models in Speech Recognition Understanding variables from speech recognition paper in HMM-GMM HMM vs Deep Learning for Speech Emotion Recognition (SER) How to change the length of the unidentified speech signal during recognition? How to feed variable-length of speech feature to RNN(LSTM) for Speech Recognition?

Related Tags

How to determine length of observation sequence for HMM in speech recognition

Question

1 answers

solution1 0 2019-09-17 10:22:46

solution1
0 2019-09-17 10:22:46