如何创建新的语言模型 NLP？ - Python

Question

i use Google Api to transcript some audio files to text with Recognizer class.我使用 Google Api 将一些音频文件转录为带有 Recognizer 类的文本。 I found out there are limited numbers of languages available, and the most commonly and internationally used are part of it.我发现可用的语言数量有限，最常用和国际上使用的语言是其中的一部分。 How can i Create a new language out of vocabulary & Train it, to then Use the language as a recognizer for audio input我如何从词汇表中创建一种新语言并对其进行训练，然后将该语言用作音频输入的识别器

Use it as language as in en-US: `将其用作 en-US 中的语言：`

r = sr.Recognizer()

r.recognize_google(language="en-US",audio_text)`

Note: I have made several searches but doesn't seem to get the exact answer to what i need... I'm on Python注意：我进行了几次搜索，但似乎没有得到我需要的确切答案......我在使用 Python

Thank you谢谢

Answer 1

If your question is "how to train a ML model to do Automatic Speech Recognition on a specific language ?", you will first need a corpus with speech and their respective transcripts.如果您的问题是“如何训练 ML 模型对特定语言进行自动语音识别？”，您首先需要一个包含语音及其各自成绩单的语料库。 Then, you can use for example Speechbrain to teach the model with the corpus.然后，您可以使用例如 Speechbrain 来教授带有语料库的模型。
If you just want to use an ASR for your specific language, do not forget to check if the model already exists.如果您只想为您的特定语言使用 ASR，请不要忘记检查模型是否已经存在。

如何创建新的语言模型 NLP？ - Python

问题描述

1 个解决方案

解决方案1
0 2021-11-08 11:08:55

如何创建新的语言模型 NLP？ - Python

问题描述

1 个解决方案

解决方案1 0 2021-11-08 11:08:55

解决方案1
0 2021-11-08 11:08:55