如何使用 cmusphinx/mozilla deepspeech/google stt 等检测填充声音，如 um、uh 等？

Question

I am working on a project in Speech Recognition and the task is to detect filler sounds like um, uh, eh, etc. on audio clips of children/students speaking in English.我正在做一个语音识别项目，任务是检测说英语的儿童/学生的音频剪辑中的填充声音，如嗯、嗯、嗯等。 Their speaking English is not that great.他们的英语口语不是很好。

How can this be done using cmuSphinx/Mozilla deep speech/google cloud speech/Kaldi?如何使用 cmuSphinx/Mozilla 深度语音/谷歌云语音/Kaldi 来做到这一点？ Or do I need to start from scratch?还是我需要从头开始？

I also tried to go through other posts and papers on how to build an ASR but since its not a long term project, I do not have the time to spend on building it from scratch and see the results.我还尝试通过其他关于如何构建 ASR 的帖子和论文尝试 go，但由于它不是一个长期项目，我没有时间从头开始构建它并查看结果。 Also, I am okay with less accuracy which I can claim to improve later on.此外，我可以接受较低的准确性，我可以声称以后会改进。

Answer 1

Have you tried just adding the filler words in your lexicon?您是否尝试过在您的词典中添加填充词？ eg the CMU pronunciation dictionary have these words as entries their published lexicon ( LINK TO COMPLETE DICTIONARY )例如，CMU 发音词典将这些词作为其出版词典的条目（链接到完整词典）

For example, in the CMU pronunciation dictionary, they have the following entries that correspond to filler sounds例如，在 CMU 发音词典中，它们有以下条目对应于填充音

AH   AA1
HM   HH AH0 M
HMM  HH AH0 M
UH   AH1
UHH  AH1
UM   AH1 M

如何使用 cmusphinx/mozilla deepspeech/google stt 等检测填充声音，如 um、uh 等？

问题描述

1 个解决方案

解决方案1
-1 2020-08-17 10:51:25

如何使用 cmusphinx/mozilla deepspeech/google stt 等检测填充声音，如 um、uh 等？

问题描述

1 个解决方案

解决方案1 -1 2020-08-17 10:51:25

解决方案1
-1 2020-08-17 10:51:25