简体   繁体   English

C# 语音识别 - 这是用户所说的吗?

[英]C# Speech Recognition - Is this what the user said?

I have need to write an application which uses a speech recognition engine -- either the built in vista one, or a third party one -- that can display a word or phrase, and recognise when the user reads it (or an approximation of it).我需要编写一个使用语音识别引擎的应用程序——无论是内置的 vista 引擎还是第三方引擎——它可以显示一个单词或短语,并识别用户何时阅读它(或它的近似值) )。 I also need to be able to switch quickly between languages, without changing the language of the operating system.我还需要能够在语言之间快速切换,而无需更改操作系统的语言。

The users will be using the system for very short periods.用户将在很短的时间内使用该系统。 The application needs to work without the requirement of first training the recognition engine to the users' voices.应用程序需要在不需要首先针对用户的声音训练识别引擎的情况下工作。

It would also be fantastic if this could work on Windows XP or lesser versions of Windows Vista.如果这可以在 Windows XP 或更低版本的 Windows Vista 上运行,那也太棒了。

Optionally, the system needs to be able to read information on the screen back to the user, in the user's selected language.可选地,系统需要能够以用户选择的语言将屏幕上的信息读回给用户。 I can work around this specification using pre-recorded voice-overs, but the preferred method would be to use a text-to-speech engine.我可以使用预先录制的画外音来解决此规范,但首选方法是使用文本到语音引擎。

Can anyone recommend something for me?有人可以为我推荐一些东西吗?

A similar question was asked on Joel on Software a while back.不久前,Joel on Software 上也有人问过类似的问题。 You can use the System.Speech.Recognition namespace to do this...with some limitations.您可以使用System.Speech.Recognition命名空间来执行此操作...但有一些限制。 Add System.Speech (should be in the GAC) to your project.将 System.Speech(应该在 GAC 中)添加到您的项目中。 Here's some sample code for a WinForms app:下面是 WinForms 应用程序的一些示例代码:

public partial class Form1 : Form
{
  SpeechRecognizer rec = new SpeechRecognizer();

  public Form1()
  {
    InitializeComponent();
    rec.SpeechRecognized += rec_SpeechRecognized;
  }

  void rec_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
  {
    lblLetter.Text = e.Result.Text;
  }

  void Form1_Load(object sender, EventArgs e)
  {
    var c = new Choices();
    for (var i = 0; i <= 100; i++)
      c.Add(i.ToString());
    var gb = new GrammarBuilder(c);
    var g = new Grammar(gb);
    rec.LoadGrammar(g);
    rec.Enabled = true;
  }

This recognizes the numbers from 1 to 100, and displays the resulting number on the form.这将识别从 1 到 100 的数字,并在表单上显示结果数字。 You'll need a form with a label called lblLetter on it.您需要一个带有名为 lblLetter 的标签的表单。

System.Speech only works with a pre-defined list of words or phrases; System.Speech 仅适用于预先定义的单词或短语列表; it's not exactly NaturallySpeaking, either in versatility or in recognition quality.它不完全是 NaturallySpeaking,无论是在多功能性还是在识别质量上。 But you don't have to train it to the user's voice, and if you only have a few different things the user can say, it works reasonably well.但是你不必训练它适应用户的声音,如果你只有一些用户可以说的不同的话,它的效果还算不错。 And it's free!而且是免费的! (if you have Visual Studio) (如果你有 Visual Studio)

It won't work well if you use very short phrases;如果你使用非常短的短语,它就不会奏效; I made a program for my kid to say letters of the alphabet and see them on-screen, but it doesn't do that well since many of the letters sound alike (especially from the mouth of a four-year-old).我为我的孩子制作了一个程序,让他说出字母表中的字母并在屏幕上看到它们,但效果不佳,因为许多字母听起来很相似(尤其是从四岁孩子的嘴里说出来)。

As for more flexible options...well, there's the aforementioned NaturallySpeaking, which has an SDK.至于更灵活的选项……嗯,还有前面提到的 NaturallySpeaking,它有一个 SDK。 But you have to contact sales to get any sort of access to it, and no pricing is listed, so it comes across as one of those "How much does it cost? Well, how much have you got?"但是你必须联系销售人员才能获得任何形式的访问权限,并且没有列出定价,因此它会被认为是“它需要多少钱?嗯,你有多少钱?” kind of things.种东西。 There doesn't seem to be a "download and play around with it" option.似乎没有“下载并使用它”选项。 :( :(

As for text-to-speech, System.Speech.Synthesis does this.至于文本转语音, System.Speech.Synthesis 就是这样做的。 It's even easier than the speech recognition.它甚至比语音识别更容易。 I wrote a small program to let me type, hit Enter, and read the text aloud.我写了一个小程序让我打字,按 Enter 键,然后大声朗读文本。 My four-year-old gets mesmerized by it.我四岁的孩子被它迷住了。 :) ("Daddy, I wanna tawk to da wobot.") :)(“爸爸,我想和 da wobot 打交道。”)

[Note: I was the development lead for the managed speech recognition API in .NET 3.0] [注意:我是 .NET 3.0 中托管语音识别 API 的开发负责人]

System.Speech is part of .NET 3.0, so it is available on both Vista and XP. System.Speech 是 .NET 3.0 的一部分,因此它在 Vista 和 XP 上都可用。 In Vista you have the added benefit of having a speech recognition engine pre-installed by the OS.在 Vista 中,您还有一个额外的好处,即操作系统预先安装了语音识别引擎。 On XP you choices are: use the SAPI 5.1 SDK with a very old engine (but might work well enough for your command and control scenario), install Office 2003 which installs a newer version of the recognizer.在 XP 上,您的选择是:将 SAPI 5.1 SDK 与一个非常旧的引擎一起使用(但对于您的命令和控制场景可能足够好),安装 Office 2003,它安装了较新版本的识别器。 There are a few SAPI 5 complient speech recognition engines available as well.还有一些符合 SAPI 5 的语音识别引擎可用。

If you need to switch languages, you will want to use the System.Speech.Recognition.SpeechRecognitionEngine class which allows you to choose the SR engine for the language you need to support.如果您需要切换语言,您将需要使用 System.Speech.Recognition.SpeechRecognitionEngine 类,它允许您为您需要支持的语言选择 SR 引擎。 Note that engines are defined by a set of languages they support (they might be using the same binary, only swapping data files to support additional languages).请注意,引擎由它们支持的一组语言定义(它们可能使用相同的二进制文件,仅交换数据文件以支持其他语言)。

Comment if you need to know more.如果您需要了解更多,请发表评论。

Philipp菲利普

Before this add 'Speech' reference在此之前添加“演讲”参考

系统语音

Found that the code example posted by Kyralessa on Oct 22nd didn't work for me but a slightly revised version did.发现 Kyralessa 在 10 月 22 日发布的代码示例对我不起作用,但稍微修改后的版本就起作用了。 When adding strings into the Choices object use full text English words not numbers.将字符串添加到 Choices 对象时,请使用全文英文单词而不是数字。 Seems the MS speech recognition engine can't recognize numbers by themselves.似乎 MS 语音识别引擎无法自行识别数字。

I have marked these modifications with some commenting added to the previous example.我在前面的示例中添加了一些注释来标记这些修改。

public partial class Form1 : Form
{
  SpeechRecognizer rec = new SpeechRecognizer();

  public Form1()
  {
    InitializeComponent();
    rec.SpeechRecognized += rec_SpeechRecognized;
  }

  void rec_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
  {
    lblLetter.Text = e.Result.Text;
  }

  void Form1_Load(object sender, EventArgs e)
  {
    var c = new Choices();

    // Doens't work must use English words to add to Choices and
    // populate grammar.
    //
    //for (var i = 0; i <= 100; i++)
    //  c.Add(i.ToString());

    c.Add("one");
    c.Add("two");
    c.Add("three");
    c.Add("four");
    // etc...

    var gb = new GrammarBuilder(c);
    var g = new Grammar(gb);
    rec.LoadGrammar(g);
    rec.Enabled = true;
  }

If the engine is what you're asking about then I've found (beware, I'm just listing, I haven't tried any of them):如果引擎是您要问的,那么我找到了(请注意,我只是列出,我还没有尝试过其中任何一个):

Lumenvox engine Lumenvox 引擎

you also have the SAPI SDK from Microsoft itself, I've only tried it for text to speech but according to its definition:您还拥有来自 Microsoft 本身的SAPI SDK ,我仅尝试将其用于文本到语音,但根据其定义:

The SDK also includes freely distributable text-to-speech (TTS) engines (in US English and Simplified Chinese) and speech recognition (SR) engines (in US English, Simplified Chinese, and Japanese). SDK 还包括可免费分发的文本转语音 (TTS) 引擎(美国英语和简体中文)和语音识别 (SR) 引擎(美国英语、简体中文和日语)。

Be warned that you're not going to get good results if you don't require training first.请注意,如果您不需要先进行培训,就不会取得好成绩。 Speech recognition is a statistical application of phonetics, a field which is pretty frank about the fact that there's so much variation in the signal that it's almost a miracle anyone can understand what anyone else says.语音识别是语音学的统计应用,这个领域非常坦率地表示信号中存在如此多的变化,以至于任何人都能理解其他人所说的话几乎是一个奇迹。 An off-the-shelf speech recognition engine will most likely tend towards a more general accent of English, but will fail miserably for anything even slightly different.现成的语音识别引擎很可能倾向于使用更一般的英语口音,但对于任何稍微不同的东西都会失败。

That's why training is so important.这就是为什么培训如此重要。 We can do well by overfitting with ease, especially if we reduce the problem space.我们可以轻松地通过过度拟合来做得很好,特别是如果我们减少问题空间。 But creating an extensible machine learning solution?但是创建一个可扩展的机器学习解决方案? Therein always lies the rub.其中总是存在摩擦。

That being says, consider Sphinx-4.话虽如此,请考虑 Sphinx-4。 It's an off-the-shelf solution written in Java available at http://cmusphinx.sourceforge.net/sphinx4/这是一个用 Java 编写的现成解决方案,可从http://cmusphinx.sourceforge.net/sphinx4/ 获得

Dragon Naturally Speaking SDK might be worth looking at. Dragon Naturally speak SDK可能值得一看。 This project looked interesting. 这个项目看起来很有趣。

Haven't got to play with either of them though.虽然没有和他们任何一个一起玩。

Text to speech is available with the Speech API .文本到语音可通过Speech API 使用 Personally, I'd probably require Vista and use the managed interfaces to System.Speech.SpeechRecognition and System.Speech.Synthesis.TtsEngine , but a P/Invoke should be possible into the unmanaged APIs if you really need XP support.就我个人而言,我可能需要 Vista 并使用System.Speech.SpeechRecognitionSystem.Speech.Synthesis.TtsEngine的托管接口,但是如果您确实需要 XP 支持,P/Invoke 应该可以进入非托管 API。

Try Microsoft Speech Server , which I think now is part of Office Communication Server 2007 .试试Microsoft Speech Server ,我认为它现在是Office Communication Server 2007 的一部分 It contains a SR/TTS engines, C# API and tools that integrate with Visual Studio.它包含一个 SR/TTS 引擎、C# API 和与 Visual Studio 集成的工具。

This is the article from MSDN magazine that first discussed using the System.Speech APIs for Vista.这是来自 MSDN 杂志的文章,该文章首先讨论了使用适用于 Vista 的 System.Speech API。 Some of it is out of date because the API changed between beta (when the article was written) and the release of Vista, but this is still one of the best resources I've found and covers a good intro to the System.Speech namespace.其中一些已经过时了,因为 API 在测试版(撰写本文时)和 Vista 发布之间发生了变化,但这仍然是我发现的最好的资源之一,并且很好地介绍了 System.Speech 命名空间. See http://msdn.microsoft.com/en-us/magazine/cc163663.aspx请参阅http://msdn.microsoft.com/en-us/magazine/cc163663.aspx

Well, this question already has many good responses but I think it is valuable to update with some info from 2016 documentation the responses from Rob Segal and Philipp Schmid pointing to this nice code example:好吧,这个问题已经有很多很好的回答,但我认为用 2016 年文档中的一些信息更新 Rob Segal 和 Philipp Schmid 指向这个不错的代码示例的回答是很有价值的:

https://msdn.microsoft.com/en-us/library/office/system.speech.recognition.speechrecognitionengine.aspx https://msdn.microsoft.com/en-us/library/office/system.speech.recognition.speechrecognitionengine.aspx

It did not use the shared recognizer of Windows (The little Windows Mic that shows out up in the middle of the screen), it use a nice in app SpeechRecognitionEngine that not need any visual cue.它没有使用 Windows 的共享识别器(显示在屏幕中间的小 Windows 麦克风),它使用了一个很好的应用程序 SpeechRecognitionEngine,不需要任何视觉提示。 The UI is completly at your control.用户界面完全由您控制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM