简体   繁体   English

使用WebRTC,Node.js和语音识别引擎进行实时语音识别

[英]Real time speech recognition using WebRTC, Node.js and speech recognition engine

A. What I am trying to implement. A.我想要实现的目标。

A web application allowing real-time speech recognition inside web browser (like this ). 一个Web应用程序,允许在Web浏览器中进行实时语音识别(如下所示 )。

B. Technologies I am currently thinking of using to achieve A. B.我目前正在考虑用于实现A的技术。

  • JavaScript JavaScript的
  • Node.js Node.js的
  • WebRTC 的WebRTC
  • Microsoft Speech API or Pocketsphinx.js or something else (cannot use Web Speech API) Microsoft Speech API或Pocketsphinx.js或其他东西(不能使用Web Speech API)

C. Very basic workflow C.非常基本的工作流程

  1. Web browser establishes connection to Node server (server acts as a signaling server and also serves static files) Web浏览器与节点服务器建立连接(服务器充当信令服务器并且还提供静态文件)
  2. Web browser acquires audio stream using getUserMedia() and sends user's voice to Node server Web浏览器使用getUserMedia()获取音频流,并将用户的语音发送到节点服务器
  3. Node server passes audio stream being received to speech recognition engine for analysis 节点服务器将正在接收的音频流传递给语音识别引擎进行分析
  4. Speech recognition engine returns result to Node server 语音识别引擎将结果返回给节点服务器
  5. Node server sends text result back to initiating web browser 节点服务器将文本结果发送回启动Web浏览器
  6. (Node server performs step 1 to 5 to process requests from other browsers) (节点服务器执行步骤1到5以处理来自其他浏览器的请求)

D. Questions D.问题

  1. Would Node.js be suitable to achieve C? Node.js是否适合实现C?
  2. How could I pass received audio streams from my Node server to a speech recognition engine running separately from the server? 如何将接收到的音频流从我的节点服务器传递到与服务器分开运行的语音识别引擎?
  3. Could my speech recognition engine be running as another Node application (if I use Pocketsphinx)? 我的语音识别引擎可以作为另一个Node应用程序运行(如果我使用Pocketsphinx)? So my Node server communicates to my Node speech recognition server. 所以我的Node服务器与我的Node语音识别服务器通信。

Would Node.js be suitable to achieve C? Node.js是否适合实现C?

Yes, though there are no hard requirements for that. 是的,虽然没有硬性要求。 Some people are running servers with gstreamer, for example check 有些人正在使用gstreamer运行服务器,例如检查

http://kaljurand.github.io/dictate.js/ http://kaljurand.github.io/dictate.js/

node should be fine too. 节点也应该没问题。

How could I pass received audio streams from my Node server to a speech recognition engine running separately from the server? 如何将接收到的音频流从我的节点服务器传递到与服务器分开运行的语音识别引擎?

There are many ways for node-to-node communication. 节点到节点通信有很多种方法。 One of them is http://socket.io . 其中一个是http://socket.io There are also plain sockets . 还有普通的插座 The particular framework depends on your requirements for fault-tolerance and scalability. 特定框架取决于您对容错和可伸缩性的要求。

Could my speech recognition engine be running as another Node application (if I use Pocketsphinx)? 我的语音识别引擎可以作为另一个Node应用程序运行(如果我使用Pocketsphinx)? So my Node server communicates to my Node speech recognition server. 所以我的Node服务器与我的Node语音识别服务器通信。

Yes, sure. 是的,当然。 You can create a node module to warp pocketsphinx API. 您可以创建一个节点模块来扭曲pocketsphinx API。

UPDATE: check this, it should be similar to what you need: 更新:检查一下,它应该类似于你需要的:

http://github.com/cmusphinx/node-pocketsphinx http://github.com/cmusphinx/node-pocketsphinx

您应该联系Andre Natal,他在去年秋天的Firefox峰会上展示了与此相似的演示,现在正在使用Firefox / FxOS实现离线语音识别的Google Summer of Code项目: http ://cmusphinx.sourceforge.net/2014/ 04 /语音项目上,GSOC 2014 /

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM