Azure 語音轉文本忽略數字

Question

我正在使用 azure 語音轉文本來查找 wav 文件中話語的時間戳。

我遇到的問題是，如果用戶記錄了數字，例如“我要數到三。一、二、三，我來了”。 output 中省略了數字。英語和其他語言都會出現這種情況。 我可以理解省略“eh”和“ah”之類的話語，但是數字？ 為什么這是默認值。

我正在使用：

speechConfig.OutputFormat = OutputFormat.Detailed;
默認語言 model。

我可以以某種方式配置 SpeechRecognizer 以使其也輸出數字嗎？

Answer 1

因此，使用以下代碼我能夠將.wav音頻文件轉換為文本而不會丟失數據。

 string speechKey = "<Your_Key>";
 string speechRegion = "Your_Region";
 
 var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);
        
speechConfig.SpeechRecognitionLanguage = "en-US";

using var audioConfig = AudioConfig.FromWavFileInput("<Path to File>");

using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

        
var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();
       
Console.WriteLine(speechRecognitionResult.Text);

output： 在此處輸入圖像描述

但顯然在轉換 model 中存在一個錯誤，如果中間有停頓， I'm going to count to three. One, two, three, here I come 。 model 將省略音頻文件中的One, two, three, here I come句話。
此外，我在音頻配置 class 的MSDOC中找不到任何內容來配置有關此問題的音頻設置。

Answer 2

我發現錯誤是我的結果無法識別數字。 它在我自己的代碼中。 在我的后處理中，我試圖從結果中去除標點符號。 在這里我也不小心去掉了數字。

Azure 語音轉文本忽略數字

問題描述

2 個解決方案

解決方案1
0 2023-01-16 12:58:42

解決方案2
0 已采納 2023-01-17 14:41:36

Azure 語音轉文本忽略數字

問題描述

2 個解決方案

解決方案1 0 2023-01-16 12:58:42

解決方案2 0 已采納 2023-01-17 14:41:36

解決方案1
0 2023-01-16 12:58:42

解決方案2
0 已采納 2023-01-17 14:41:36