简体   繁体   中英

How to add grammar/hints to microsoft-cognitiveservices-speech-sdk?

I have a basic setup with the Javascript library of microsoft-cognitiveservices-speech-sdk. I use the browser implementation, not the node implementation. Overall it works fine, yet some issues do occur in which the transcription is a bit off.

Background

The project I am working on is a web application and it uses speech recognition. The user interacts with the application with business codes like A6, B12, ...

I use webkitSpeechRecognition whenever possible, in any other case I provide a fallback with microsoft-cognitiveservices-speech-sdk, which the majority of times works very well.

Issue

The business codes are not always correctly transcribed on microsoft-cognitiveservices-speech-sdk. webkitSpeechRecognition does a better job with this.

Example (in French):

  • User > A20 (prononcé "a vingt")
  • STT > Avant
  • Expected: A20

This might seem close but it isn't, webkitSpeechRecognition is able to solve this one correctly. In the documentation, it seems that one can provide a dynamic grammar and suggestions/hints in order to help the STT. Yet I wasn't able to find an example or a way to use this interface. I was wondering if some of might have a lead for this.

To elaborate this a bit more, I was thinking of providing a IDynamicGrammar object, but I don't know if this is the correct approach nor do I know how to provide this.

Side note

  • I can use a sort of mechanism like ElasticSearch to find the correct correspondence, yet this only takes me so far. I would really like to optimise the STT.
  • I cannot force all the users to use Chrome
  • I cannot change the business codes

Reading through the article: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-phrase-lists?pivots=programming-language-javascript

The phrase list is currently applicable only to the English language.

Alternatively, you could train/customize your own model.

The below article details the same:

在此处输入图像描述

Please note the pronunciation mapping/hints in the Azure Speech to Text is currently available only for the English and German language at this point of time.

在此处输入图像描述

Reference: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-test-and-train#related-text-data-for-training

However, I had tried casually with the uttered sentences - mentioned the article here As this did not have any language restriction.

I created the sample sentences as related text, trained the model & deployed model. This had slightly better recognition of the codes/non-grammar words. Sample sentences:

  • This is A 20 Business
  • There is going be a B 6 Business Model
  • B 6 on the other hand is not doing good as a business
  • Please indicate the C 26 profits.

Out of the Box Speech Recognition:

在此处输入图像描述

After Using the custom trained mode for the Speech Recognition:

在此处输入图像描述

Having said that, I assume that if we train the model with more data - sentences,audio with labeled text(as this also doesn't have any language restriction). The custom model will serve your requirement.

To consume the custom model in the Java Script you could refer this article:

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-specify-source-language?pivots=programming-language-more

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM