To use Google's Speech API directly it is now required that you obtain an API key. To get that key you must subscribe to the chromium-dev@chromium.org newsgroup, and then follow a few steps and Google will give you a developer's key that is "not for distribution." The key is rate limited for 50 requests/day.
For example, node-google-speech-api outlines the need for having this key for a node application to access Google's Speech API directly (without the use of a browser): https://github.com/psirenny/node-google-speech-api
There are also PHP libraries and Java libraries for accessing Google's Speech API, also requiring this key.
I would like to write a desktop application that utilizes Google's speech recognition technology, but the 50 requests/day limit is unacceptable for wide distribution and even for a single desktop deployment of my envisioned software. I see up to 500 requests/day by an individual desktop user if the voice recognition is broken up somehow, and most of these would probably be long-polling/continuous so maybe it'd only be 2 or 3 requests/day but hours at a time. Multiply that by a few hundred users and I'd be easily exceeding 50 requests/day.
I was trying to think of a way to access Google's superior speech recognition technology on the desktop in my own app (language doesn't matter but node.js would likely be part of the mix so a node.js solution would be preferred) without this limit and that brought me to consider the Web Speech API standard which Google Chrome happens to implement.
As far as I know, there is not a hard request/day limit imposed on Google Chrome's implementation of the Web Speech API, and I could happily write websites that used Web Speech API all day long without or with minimal restrictions compared to Google Speech API direct. This brought me to thinking, what if I distributed a Chrome (not Chromium) browser, so the bonafide Google Chrome browser, but added an "extension" to it that allowed javascript within a custom html5 web page to interface with other applications on the client's system (ie a Node.js app running alongside this special installation of Chrome) and wrote my speech recognition portion in Javascript, Web Speech API style, and piped the output into the other application I design and have installed on clients' systems.
Would/could that work?
What are the pitfalls of this approach?
Do you have suggestions of another approach or would you perhaps recommend a commercially-licensed solution that is comparable to the ease of use and extreme natural language accuracy of Google's speech technology?
One possible approach to try is a Chrome App
It will run in a sandboxed instance of Chrome and will be implemented with HTML + Javascript.
To the user it will look just like a desktop application.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.