简体   繁体   中英

How to determine length of observation sequence for HMM in speech recognition

I'm re-learning how to use Hidden Markov Models for speech recognition and I have a question. It seems that most/all discussions of using HMM's consider the case of a known sequence of observation: [O1, O2, O3,...,OT] where T is a known number. However, if we were to try to use a trained HMM on speech in real time, or in a WAV file where someone was speaking one sentence after another, how exactly does one select the value of T? In other words, how does one know when the speaker has ended one sentence and started another? Does a practical HMM for speech recognition just use a fixed value for T and periodically recomputes the optimal state sequence up to the current observation using a fixed size window of length T into the past? Or is there some better way for dynamically selecting T at any instance of time?

Does a practical HMM for speech recognition just use a fixed value for T and periodically recomputes the optimal state sequence up to the current observation using a fixed size window of length T into the past?

Viterbi decoding algorithm works frame by frame, so you just iterate over frames, you can iterate indefinitely until backtracking matrix fills all the memory.

Training algorithm considers audios that are prepared before training, usually 1-30 seconds. For training audio length is already known.

how does one know when the speaker has ended one sentence and started another?

There are different strategies here. Decoders search for the silence to wrap around decoding. Silence doesn't necessary mean the break between sentences, there could be no break between sentences at all. There could be break in the middle of a sentence too.

So to find silence decoder can use standalone voice activity detection algorithm and break when VAD detects silence or decoder can analyze backtrack information to decide if silence appeared. The second method is a bit more reliable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM