I have a basic setup with the Javascript library of microsoft-cognitiveservices-speech-sdk. I use the browser implementation, not the node implementation. Overall it works fine, yet some issues do occur in which the transcription is a bit off.
The project I am working on is a web application and it uses speech recognition. The user interacts with the application with business codes like A6, B12, ...
I use webkitSpeechRecognition whenever possible, in any other case I provide a fallback with microsoft-cognitiveservices-speech-sdk, which the majority of times works very well.
The business codes are not always correctly transcribed on microsoft-cognitiveservices-speech-sdk. webkitSpeechRecognition does a better job with this.
Example (in French):
This might seem close but it isn't, webkitSpeechRecognition is able to solve this one correctly. In the documentation, it seems that one can provide a dynamic grammar and suggestions/hints in order to help the STT. Yet I wasn't able to find an example or a way to use this interface. I was wondering if some of might have a lead for this.
To elaborate this a bit more, I was thinking of providing a IDynamicGrammar object, but I don't know if this is the correct approach nor do I know how to provide this.
Reading through the article: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-phrase-lists?pivots=programming-language-javascript
The phrase list is currently applicable only to the English language.
Alternatively, you could train/customize your own model.
The below article details the same:
Please note the pronunciation mapping/hints in the Azure Speech to Text is currently available only for the English and German language at this point of time.
However, I had tried casually with the uttered sentences - mentioned the article here As this did not have any language restriction.
I created the sample sentences as related text, trained the model & deployed model. This had slightly better recognition of the codes/non-grammar words. Sample sentences:
- This is A 20 Business
- There is going be a B 6 Business Model
- B 6 on the other hand is not doing good as a business
- Please indicate the C 26 profits.
Out of the Box Speech Recognition:
After Using the custom trained mode for the Speech Recognition:
Having said that, I assume that if we train the model with more data - sentences,audio with labeled text(as this also doesn't have any language restriction). The custom model will serve your requirement.
To consume the custom model in the Java Script you could refer this article:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.