简体   繁体   English

语音识别

[英]Phonetic Speech Recognition

I'm trying to get Latin Speech-Recognition for which I'll need, . 我正在尝试获取我需要的拉丁语音识别。 . . not word-recognition but . 不是单词识别,而是。 . . phonetic-vowel-and-consonant-recognition (since Latin has only 40 sounds, but over 40,000 words x 60 avg. endings = 2.5 MILLION word-forms). 语音元音和辅音识别 (由于拉丁语只有40种声音,但超过40,000个字x 60个平均结尾= 250万个字形)。 The problem is, . 问题是, 。 . . both the Web Speech API and Google Cloud Speech only begin you with supposedly similar-sounding complete words (and from an English grammar, too, since there are no 2.5 Million-word Latin Grammars out there), and so there's no way for me to get down to processing the actual phonetic sounds, IN PARTICULAR JUST THE WORD-STEM (the first half of the word), which distinguishes each word, rather than the word-ending which uselessly (to me) tells how it's functioning in the sentence. 无论是Web Speech API还是Google Cloud Speech都只能以听起来相似的完整单词开始(并且也来自英语语法,因为那里没有250万单词的拉丁文语法),所以我没有办法着手处理实际的语音,特别是单词词干 (单词的前半部分),它区分每个单词,而不是无用的词尾(对我而言)无谓地告诉了它在句子中的功能。 Ideally, I'd want to have a grammar of word-stems such as 理想情况下,我希望有一个词干语法,例如

  • "am-" ( short for amo,amare,amavi,amatus, etc.), “上午”( 简称 AMO,阿玛尔,amavi,amatus等),
  • "vid-" ( short for video,videre,vidi,visus, etc.), “vid-”( 短的视频,videre,VIDI,visus等),
  • "laet-" ( short for laetus, laeta, laetum, etc.) “ laet-”(laetus,laeta,laetum等的缩写

  • etc. 等等

But speech-recognition technology can't search for that. 但是语音识别技术无法搜索到。
So where can I get phonetic speech recognition? 那么我在哪里可以获得语音识别?

I prefer jS, pHp, or Node, and preferably client-side, rather than streaming. 我更喜欢jS,pHp或Node,最好是客户端,而不是流。

Here's my code so far, for the Web Speech API . 到目前为止,这是我的Web Speech API代码 The key thing is the console.log() s which show my trying to dig into each returned possible-word's properties: 关键是console.log() ,它显示了我尝试挖掘每个返回的可能单词的属性:

speech.onresult = function(event) { 
    var interim_transcript = '';
    var final_transcript = '';

    for (var i = event.resultIndex; i < event.results.length; ++i) { 
        if (event.results[i].isFinal) { 
            final_transcript += event.results[i][0].transcript;

            // This console.log shows all 3 word-guess possibilities.
               console.log(event.results[i]);
                    //These console.logs show each individual possibility:
                     //console.log('Poss-1:'); console.log(event.results[i][0]);
                     //console.log('Poss-2:'); console.log(event.results[i][1]);
                     //console.log('Poss-3:'); console.log(event.results[i][2]);
            for (var a in event.results[i]) {
                for (var b in event.results[i][a]) {
                  /*This black-&-yellow console.log below shows me trying to dig into
                  each returned possibility's PROPERTIES, but alas, the only 
                  returned properties are 
                  (1) the transcript (i.e. the guessed word), 
                  (2) the confidence (i.e. the 0-to-1 likelihood of it being that word)
                  (3) the prototype 
                   */
                    console.log("%c Poss-"+a+" %c "+b+": "+event.results[i][a][b], 'background-color: black; color: yellow; font-size: 14px;', 'background-color: black; color: red; font-size: 14px;'); 
                }        
            }

      } 
    }
    if (action == "start") {
        transcription.value += final_transcript;
        interim_span.innerHTML = interim_transcript;                       
    }
};    

You can use create a SpeechGrammarList . 您可以使用创建SpeechGrammarList See also JSpeech Grammar Format . 另请参见JSpeech语法格式

Example description and code at MDN MDN的示例说明和代码

The SpeechGrammarList interface of the Web Speech API represents a list of SpeechGrammar objects containing words or patterns of words that we want the recognition service to recognize. Web Speech API的SpeechGrammarList接口表示一个SpeechGrammar对象的列表,其中包含我们希望识别服务识别的单词或单词模式。

Grammar is defined using JSpeech Grammar Format (JSGF.) Other formats may also be supported in the future. 语法是使用JSpeech语法格式(JSGF)定义的。将来可能还会支持其他格式。

var grammar = '#JSGF V1.0; grammar colors; public <color> = aqua | azure | beige | bisque | black | blue | brown | chocolate | coral | crimson | cyan | fuchsia | ghostwhite | gold | goldenrod | gray | green | indigo | ivory | khaki | lavender | lime | linen | magenta | maroon | moccasin | navy | olive | orange | orchid | peru | pink | plum | purple | red | salmon | sienna | silver | snow | tan | teal | thistle | tomato | turquoise | violet | white | yellow ;'
var recognition = new SpeechRecognition();
var speechRecognitionList = new SpeechGrammarList();
speechRecognitionList.addFromString(grammar, 1);
recognition.grammars = speechRecognitionList;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM