简体繁体 English

构建openears兼容的语言模型

[英]Building openears compatible language model

原文 2011-03-07 14:08:44 9 2 iphone/ speech-recognition/ language-model

I am doing some development on speech to text and text to speech and I found the OpenEars API very useful. 我正在对文本和文本到语音的语音进行一些开发，我发现OpenEars API非常有用。

The principle of this cmu-slm based API is it uses a language model to map the speech listened by the iPhone device. 这个基于cmu-slm的API的原理是它使用语言模型来映射iPhone设备收听的语音。 So I decided to find a big English language model to feed the API speech recognizer engine. 所以我决定找一个大的英语语言模型来提供API语音识别器引擎。 But I failed to understand the format of the voxfourge english data model to use with OpenEars. 但我无法理解与OpenEars一起使用的voxfourge英文数据模型的格式。

Do anyone have any idea that how can I get the .languagemodel and .dic file for English language to work with OpenEars? 有没有人知道如何让英语语言的.languagemodel和.dic文件与OpenEars一起使用？

2 个解决方案

Regarding LM Formats: 关于LM格式：

AFAIK most Language Models use the ARPA standard for Language Models. AFAIK大多数语言模型使用语言模型的ARPA标准。 Sphinx / CMU language models are compiled into binary format. Sphinx / CMU语言模型被编译成二进制格式。 You'd need the source format to convert a Sphinx LM into another format. 您需要源格式将Sphinx LM转换为另一种格式。 Most other Language Models are in text format. 大多数其他语言模型都是文本格式。

I'd recommend using the HTK Speech Recognition Toolkit ; 我建议使用HTK语音识别工具包; Detailed Documentation here: http://htk.eng.cam.ac.uk/ftp/software/htkbook_html.tar.gz 详细文档： http ： //htk.eng.cam.ac.uk/ftp/software/htkbook_html.tar.gz

Here's also a description of CMU's SLM Toolkit: http://www.speech.cs.cmu.edu/SLM/toolkit_documentation.html 这里还介绍了CMU的SLM工具包： http ： //www.speech.cs.cmu.edu/SLM/toolkit_documentation.html

Here's an example of a language model in ARPA format I found on the net: http://www.arborius.net/~jphekman/sphinx/full/index.html 以下是我在网上找到的ARPA格式的语言模型示例： http ： //www.arborius.net/~jphekman/sphinx/full/index.html

You probably want to create an ARPA LM first, then convert it into any binary format if needed. 您可能希望首先创建ARPA LM，然后根据需要将其转换为任何二进制格式。

In General: 一般来说：

To build a language model, you need lots and lots of training data - to determine what the probability of any other word in your vocabulary is, after observing the current input to this point in time. 要构建语言模型，您需要大量的训练数据 - 在观察到此时间点的当前输入后，确定词汇表中任何其他单词的概率。

You can't just "make" a language model by just adding the words you want to recognize - you also need a lot of training data (= typical input you observe when running your speech recognition application). 您不仅可以通过添加要识别的单词来“制作”语言模型 - 您还需要大量的训练数据（=运行语音识别应用程序时观察到的典型输入）。

A Language Model is not just a word list -- it estimates the probability of the next token (word) in the input. 语言模型不仅仅是一个单词列表 - 它估计输入中下一个标记（单词）的概率。 To estimate those probabilities, you need to run a training process, which goes over training data (eg historic data), and observes word frequencies there to estimate above mentioned probabilities. 为了估计这些概率，您需要运行一个训练过程，该过程遍历训练数据（例如历史数据），并观察那里的词频以估计上述概率。

For your problem, maybe as a quick solution, just assume all words have the same frequency / probability. 对于您的问题，也许作为一种快速解决方案，只需假设所有单词具有相同的频率/概率。

create a dictionary with the words you want to recognize (N words in dictionary) 用你想要识别的单词创建一个字典（字典中的N个单词）
create a language model which has 1/N as the probability for each word (uni-gram language model) 创建一个语言模型，其中每个单词的概率为1 / N（单语语言模型）

you can then interpolate that uni-gram language model (LM) with another LM for a bigger corpus using HTK Toolkit 然后，您可以使用HTK Toolkit将该单一语言模型（LM）与另一个LM插入更大的语料库

Old question, but maybe the answer is still interesting. 老问题，但也许答案仍然很有趣。 OpenEars now has built-in language model generation, so one option is for you to create models dynamically in your app as you need them using the LanguageModelGenerator class, which uses the MITLM library and NSScanner to accomplish the same task as the CMU toolkit mentioned above. OpenEars现在具有内置语言模型生成功能，因此您可以使用LanguageModelGenerator类在应用程序中动态创建模型，该类使用MITLM库和NSScanner来完成与上述CMU工具包相同的任务。。 Processing a corpus with >5000 words on the iPhone is going to take a very long time, but you could always use the Simulator to run it once and get the output out of the documents folder and keep it. 在iPhone上处理超过5000个单词的语料库需要很长时间，但是你总是可以使用模拟器运行一次并从文档文件夹中获取输出并保留它。

Another option for large vocabulary recognition is explained here: 这里解释了大词汇识别的另一种选择：

Creating ARPA language model file with 50,000 words 创建50,000字的ARPA语言模型文件

Having said that, I need to point out as the OpenEars developer that the CMU tool's limit of 5000 words corresponds pretty closely to the maximum vocabulary size that is likely to have decent accuracy and processing speed on the iPhone when using Pocketsphinx. 话虽如此，我需要指出，作为OpenEars开发人员，CMU工具的5000字的限制与使用Pocketsphinx时iPhone上可能具有相当的准确度和处理速度的最大词汇量大小非常接近。 So, the last suggestion would be to either reconceptualize your task so that it doesn't absolutely require large vocabulary recognition (for instance, since OpenEars allows you switch models on the fly, you may find that you don't need one enormous model but can get by with multiple smaller ones that you can switch in in different contexts), or to use a network-based API that can do large vocabulary recognition on a server (or make your own API that uses Sphinx4 on your own server). 所以，最后一个建议是重新定义你的任务，这样它就不会绝对需要大量的词汇识别（例如，因为OpenEars允许你动态切换模型，你可能会发现你不需要一个巨大的模型但是可以使用多个较小的可以在不同的上下文中切换的内容），或者使用可以在服务器上进行大量词汇识别的基于网络的API（或者在自己的服务器上创建自己的使用Sphinx4的API）。 Good luck! 祝好运！