语音识别质量极差，尤其是与 Word 相比

Question

I'm using the WPF speech recognition library, trying to use it in a desktop app as an alternative to menu commands.我正在使用 WPF 语音识别库，试图在桌面应用程序中使用它来替代菜单命令。 (I want to focus on the tablet experience, where you don't have a keyboard). （我想专注于没有键盘的平板电脑体验）。 It works - sort of, except that the accuracy of recognition is so bad it's unusable.它可以工作 - 有点，除了识别的准确性太差以至于无法使用。 So I tried dictating into Word.所以我试着听写到 Word。 Word worked reasonable well. Word 工作得很好。 I'm using my built-in laptop microphone in both cases, and both programs are capable of hearing the same speech simultaneously (provided Word retains keyboard focus), but Word gets it right and WPF does an abysmal job.在这两种情况下，我都使用我的内置笔记本电脑麦克风，两个程序都能够同时听到相同的语音（前提是 Word 保持键盘焦点），但 Word 做对了，WPF 做得很糟糕。

I've tried both a generic DictationGrammar() and a tiny specialised grammar, and I've tried both "en-US" and "en-AU", and in all cases Word performs well and WPF performs poorly.我尝试了通用的 DictationGrammar() 和微小的专业语法，并且尝试了“en-US”和“en-AU”，在所有情况下 Word 都表现良好，WPF 表现不佳。 Even comparing the specialised grammar in WPF to the general grammar in Word, WPF gets it wrong 50% of the time eg hearing "size small" as "color small".即使将 WPF 中的专业语法与 Word 中的一般语法进行比较，WPF 也有 50% 的时间会出错，例如将“size small”听成“color small”。

    private void InitSpeechRecognition()
    {
        recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));

        // Create and load a grammar.  
        if (false)
        {
            GrammarBuilder grammarBuilder = new GrammarBuilder();
            Choices commandChoices = new Choices("weight", "color", "size");
            grammarBuilder.Append(commandChoices);
            Choices valueChoices = new Choices();
            valueChoices.Add("normal", "bold");
            valueChoices.Add("red", "green", "blue");
            valueChoices.Add("small", "medium", "large");
            grammarBuilder.Append(valueChoices);
            recognizer.LoadGrammar(new Grammar(grammarBuilder));
        }
        else
        {
            recognizer.LoadGrammar(new DictationGrammar());
        }

        // Add a handler for the speech recognized event.  
        recognizer.SpeechRecognized +=
                            new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);

        // Configure input to the speech recognizer.  
        recognizer.SetInputToDefaultAudioDevice();

        // Start asynchronous, continuous speech recognition.  
        recognizer.RecognizeAsync(RecognizeMode.Multiple);
    }

Sample results from Word: Word 的示例结果：

Hello 
make it darker 
I want a brighter colour 
make it reader 
make it greener 
thank you 
make it bluer 
make it more blue
make it darker 
turn on debugging 
turn off debugging 
zoom in 
zoom out

The same audio in WPF, dictation grammar: WPF中的相同音频，听写语法：

a lower
make it back
when Ted Brach
making reader
and he
liked the
ethanol and
act out
to be putting
it off the parking
zoom in
and out

I got the assembly using Nuget.我使用 Nuget 得到了组件。 I'm using Runtime version=v4.0.30319 and version=4.0.0.0.我正在使用运行时版本=v4.0.30319 和版本=4.0.0.0。 If I'm supposed to "train" it, the documentation doesn't explain how to do this, and I don't know if the training is shared with other programs such as Word, or where the training is saved.如果我应该“训练”它，文档没有解释如何做到这一点，而且我不知道训练是否与 Word 等其他程序共享，或者训练保存在哪里。 I've been playing around with it long enough now for it to know the sound of my voice.我已经玩了足够长的时间让它知道我的声音。

Can anyone tell me what I'm doing wrong?谁能告诉我我做错了什么？

Answer 1

This is expected.这是意料之中的。 Word's dictation uses a cloud based, AI/ML assisted speech service: Azure Cognitive Services - Speech To Text . Word 的听写使用基于云的 AI/ML 辅助语音服务： Azure 认知服务 - 语音转文本。 It is being constantly trained and updated for the best accuracy.它正在不断地进行培训和更新，以获得最佳准确性。 You can easily test this by going offline and trying the dictation feature in Word - it won't work.您可以通过脱机并尝试 Word 中的听写功能来轻松测试这一点 - 它不起作用。

.NET's System.Speech uses the offline SAPI5 which hasn't been updated since Windows 7 as far as I'm aware.据我所知，.NET 的 System.Speech 使用自 Windows 7 以来尚未更新的离线SAPI5 。 The core technology itself (Windows 95 era) is much older than what is available on today's phones or cloud based services.核心技术本身（Windows 95 时代）比当今手机或基于云的服务上可用的技术要古老得多。 Microsoft.Speech.Recognition also uses similar core and won't be much better - although you can give it a try. Microsoft.Speech.Recognition 也使用类似的核心，并且不会更好 - 尽管您可以尝试一下。

If you want to explore other offline options, I would suggest trying Windows.Media.SpeechRecognition .如果您想探索其他离线选项，我建议您尝试Windows.Media.SpeechRecognition 。 As far as I'm aware, it is the same technology as used by Cortana and other modern voice recognition apps on Windows 8 and up and does not use SAPI5.据我所知，它与 Cortana 和 Windows 8 及更高版本上的其他现代语音识别应用程序使用的技术相同，并且不使用 SAPI5。

It's pretty easy to find examples for Azure or Windows.Media.SpeechRecognition online, the best way to use the latter would be to update your app to .NET 5 and use C#/WinRT to access the UWP APIs. It's pretty easy to find examples for Azure or Windows.Media.SpeechRecognition online, the best way to use the latter would be to update your app to .NET 5 and use C#/WinRT to access the UWP APIs.

Answer 2

Your best bet I would say to use not a DictationGrammar but specific grammars with whole phrases or with key-values assignments:您最好的选择是不要使用DictationGrammar ，而是使用带有整个短语或键值分配的特定语法：

private static SpeechRecognitionEngine CreateRecognitionEngine()
{
    var cultureInf = new System.Globalization.CultureInfo("en-US");

    var recoEngine = new SpeechRecognitionEngine(cultureInf);
    recoEngine.SetInputToDefaultAudioDevice();
            
    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "weight", new string[] { "normal", "bold", "demibold" }));
    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "color", new string[] { "red", "green", "blue" }));
    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "size", new string[]{ "small", "medium", "large" }));

    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "", new string[] { "Put whole phrase here", "Put whole phrase here again", "another long phrase" }));

    return recoEngine;
}

static Grammar CreateKeyValuesGrammar(CultureInfo cultureInf, string key, string[] values)
{
    var grBldr = string.IsNullOrWhiteSpace(key) ? new GrammarBuilder() { Culture = cultureInf } : new GrammarBuilder(key) { Culture = cultureInf };
    grBldr.Append(new Choices(values));

    return new Grammar(grBldr);
}

You may also try to use Microsoft.Speech.Recognition see What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?您也可以尝试使用Microsoft.Speech.Recognition请参阅System.Speech.Recognition 和 Microsoft.Speech.Recognition 有什么区别？

Answer 3

As you are actually creating a voice user interface and not only doing speech recognition, you should check out Speechly .由于您实际上是在创建语音用户界面，而不仅仅是进行语音识别，因此您应该查看Speechly 。 With Speechly it's a lot easier to create natural experiences that don't require hard-coded commands but rather support multiple ways of expressing the same thing.使用 Speechly，创建不需要硬编码命令而是支持多种表达同一事物的方式的自然体验要容易得多。 Integrating it to your application should be pretty simple, too.将它集成到您的应用程序中也应该非常简单。 There's a small codepen on the front page to get a basic understanding.头版上有一个小代码笔，可以让你有一个基本的了解。

Answer 4

If everyone needs to use a speech recognition engine that has 90% of the accuracy of Cortana it should follow these steps.如果每个人都需要使用具有 Cortana 90% 准确率的语音识别引擎，则应遵循以下步骤。

Step 1) Download the Nugget package Microsoft.Windows.SDK.Contracts步骤 1) 下载 Nugget package Microsoft.Windows.SDK.Contracts

Step 2) Migrate to the package reference the SDK --> https://devblogs.microsoft.com/nuget/migrate-packages-config-to-package-reference/步骤 2) 迁移到 package 参考 SDK --> https://devblogs.microsoft-com/nuget/migrate-packages-packages-config

The above mentioned SDK will provide you with the windows 10 speech recognition system within Win32 apps.上面提到的 SDK 将为您提供 Win32 应用程序中的 windows 10 语音识别系统。 This has to be done because the only way to use this speech recognition engine is to build a Universal Windows Platforms application.必须这样做，因为使用此语音识别引擎的唯一方法是构建通用 Windows 平台应用程序。 I don't recommend making an AI application in the Universal Windows Platform because it has sandboxing.我不建议在通用 Windows 平台中制作 AI 应用程序，因为它具有沙盒功能。 The sandboxing function is isolating the app in a container and it won't allow it to communicate with any hardware and it will also make file access an absolute pain and thread management isn't possible, only async functions.沙盒 function 将应用程序隔离在一个容器中，它不允许它与任何硬件通信，它还会使文件访问变得非常痛苦，并且无法进行线程管理，只有异步功能。

Step 3) Add this namespace in the namespace section.步骤 3) 在命名空间部分添加这个命名空间。 This namespace has all the functions that are related to online speech recognition.此命名空间具有与在线语音识别相关的所有功能。

using Windows.Media.SpeechRecognition;

Step 4) Add the speech recognition implementation.步骤 4) 添加语音识别实现。

Task.Run(async()=>
{
  try
  {
    
    var speech = new SpeechRecognizer();
    await speech.CompileConstraintsAsync();
    SpeechRecognitionResult result = await speech.RecognizeAsync();
    TextBox1.Text = result.Text;
  }
  catch{}
});

The majority of the methods within the Windows 10 SpeechRecognizer class require to be called asynchronously and this means that you must run them within a Task.Run(async()=>{}) lambda function with an async parameter, an async method or an async Task method. The majority of the methods within the Windows 10 SpeechRecognizer class require to be called asynchronously and this means that you must run them within a Task.Run(async()=>{}) lambda function with an async parameter, an async method or an异步任务方法。

In order for this to work go to Settings -> Privacy -> Speech in the OS and check if the online speech recognition is allowed.为了使它工作 go 在操作系统中设置 -> 隐私 -> 语音并检查是否允许在线语音识别。

Answer 5

A simple solution would be using the dictate function available in the Word Office 365. Rest all functionalities like grammar, language are taken care by dictate function.一个简单的解决方案是使用 Word Office 365 中提供的指令 function。Rest 指令 ZC1C425268E68385D1AB4ZZZC17A94F1 负责所有功能，如语法、语言。

To access dictate function in word office 365 use the below code.要在 word office 365 中访问指令 function，请使用以下代码。

Application.CommandBars.ExecuteMso(“Dictate”) Application.CommandBars.ExecuteMso（“听写”）

语音识别质量极差，尤其是与 Word 相比

问题描述

5 个解决方案

解决方案1
2 已采纳 2021-05-17 17:55:44

解决方案2
1 2021-05-12 16:26:58

解决方案3
0 2021-04-21 10:39:12

解决方案4
0 2021-11-19 22:22:13

解决方案5
0 2022-09-18 07:13:20

语音识别质量极差，尤其是与 Word 相比

问题描述

5 个解决方案

解决方案1 2 已采纳 2021-05-17 17:55:44

解决方案2 1 2021-05-12 16:26:58

解决方案3 0 2021-04-21 10:39:12

解决方案4 0 2021-11-19 22:22:13

解决方案5 0 2022-09-18 07:13:20

解决方案1
2 已采纳 2021-05-17 17:55:44

解决方案2
1 2021-05-12 16:26:58

解决方案3
0 2021-04-21 10:39:12

解决方案4
0 2021-11-19 22:22:13

解决方案5
0 2022-09-18 07:13:20