简体   繁体   中英

Speech-to-text Recognition is not accurate

I am trying to implement Speech-to-text recognition in my React website, and I am using the react-speech-recognition package from npm. I am using the exact code they have specified in the package description over here: npm
Now it works with everyday speech, anything I say, but when I induce technical jargon, it goes way off!

Here's what I am trying to say to it, it's aviation jargon:

Cleared to enter the CTR, not above 1500 feet, join and report on a right downwind runway 19, QNH 1018, squak 2732

This is what I get in response:

please to enter the city are not above 15 feet heart penetrate join and report on a ride on the wind blown away 9 theme

What else do I need to do to fix the accuracy of the recognition?

That package leverages the Speech Recognition Interface of your browser'sWeb Speech API . The React Library's API allows you to get the underlying SpeechRecognition object via a call to the getRecognition() method.

The underlying SpeechRecognition object's API allows for the addition of Grammars using the JSpeech Grammar Format . Here's an example . So in theory, you could provide more information about the words you're expecting to hear in your app, and thereby improve performance.

But there are caveats, including:

  • There is very limited browser support for the speech recognition generally, and for the addition of grammars, specifically. Obviously if you don't have control over what browser your users will be using, that means the quality of recognition will vary, and might not work at all if you don't use Polyfills .
  • Depending on how the speech recognition is implemented, things like hardware configuration and the Operating System may impact speech recognition results.
  • Speech recognition is an extremely inexact science. The best automatic speech recognition software/services only boast about 85% accuracy, even with ordinary speech. The ones built into your browser probably won't be even that good.

You may be able to get better accuracy from cloud-based speech services. Azure Cognitive Services , for example, allows you to create custom voice models, custom grammars, etc. Of course, they also charge you based on usage, and they charge more if you're using customizations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM